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Preface 


“If there is any great secret of success in life, it lies in the ability 
to put yourself in the other person's place and to see things from 
his point of view - as well as your own." 


- Henry Ford, (1868-1947) 


Biomedical engineering is not the ‘new’ discipline it is often quoted to be. 
Electric currents in biology were recognized long before electric currents in 
wires. Medicine has relied on mechanical, electrical and more recently elec- 
tronic equipment almost as a matter of course. Engineering in therapy is not 
a new concept either. Passive mechanical implants such as hip-joint replace- 
ment are recorded as early as the 1890s. Functional electrical stimulation can 
be traced back to the 1700s, and active implants such as artificial pace-makers 
have been delivering electrical stimuli to human hearts since the 1950s. More 
sophisticated systems like the bionic ear were first used to restore auditory 
function to the deaf not long after. 

What is true is that the volume of research in biomedical engineering 
has grown tremendously over the past couple of decades. The technological 
developments that allowed large-scale processing of data, coupled with the po- 
litical shift that sees more money available for anything bio-technology, mean 
that the demand for suitably qualified researchers in this field escalated very 
rapidly. Unfortunately the training of such researchers has not kept up with 
this demand. It is true that there are plenty of biologists and plenty of engi- 
neers around, but to develop biologically applicable technology, particularly in 
the case where devices must interface directly with the biology, it is necessary 
to have people that are both. 

This book is a look at one particular area of biomedical engineering: un- 
derstanding the brain from a ‘signals’ perspective. The expertise and theory 
necessary to make progress in understanding the brain are well established 
in both neurophysiology and engineering alike, but the communication link 
that would allow the portability of knowledge from one discipline to the other 
(we contend) is not. The combined field of neuro-engineering is in its infancy. 
Fast development of neuro-engineering practices is hindered by a lack of com- 
munication or an understanding of ‘common ground’. Neurophysiologists are 
not trained in the mathematical tools commonly used by the electrical en- 
gineer, whilst the engineer is typically not trained in the biology of human 
tissue. Furthermore both disciplines have in the last century evolved almost 
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entirely separately, developing their own scientific practices, idiosyncrasies, 
terminology and, yes, even prejudices. 

Sometimes misunderstandings arise simply because the scientific practices 
of each discipline are different. For example, imagine that both neurophysiol- 
ogists and engineers are trying to gather information so that they can under- 
stand how the brain works. Neurophysiologists are interested in as complete 
an understanding as possible. The more detail they can get hold of, the bet- 
ter. In contrast engineers are interested in the amount of detail that can be 
thrown out. How much detail they keep depends on the level of understanding 
they need for a particular application. Fundamentally different approaches are 
taken by each discipline to answer the same question. 

The silos of neurophysiology and electrical engineering are very real — the 
jargon that a medical doctor speaks so naturally can dumbfound the engineer, 
whilst concepts that are fundamental to training an engineer leave the biologist 
at a loss. If the combined field is to progress more rapidly than the time it 
takes to train a new generation of multi-disciplinary minded researchers, then 
communication barriers must be broken down. 


As engineers we must ask ourselves: how do we make our field 
more accessible to the neurologist? Mathematics and computa- 
tion are very useful, but what is the point if it is not understood? 


As neurophysiologists we must communicate ideas by using 
language that is more accessible, either by selecting important 
information and conveying it in simpler terms, or through edu- 
cation of the broader community. 


Communication is the key: An inter-disciplinary experiment 
must be designed in an inter-disciplinary environment. Engi- 
neers and neurophysiologists must communicate their needs to 
each other, and the assumptions made must be consistent. 


This book subscribes to the idea that accessibility is key in a multi- 
disciplinary research environment. Through the study of epilepsy, a common 
neurological pathology or disorder, we present the relevant physiology and 
electromagnetics of biological systems. We describe the tools available (along 
with their limitations) for the analysis of neurophysiological data. Both en- 
gineering and neurophysiological terminology are kept to a minimum so that 
anyone belonging to either discipline, and perhaps people new to both, can 
make sense of the information. Concepts are favored above detail so that 
ideas are not lost in a sea of information. Suitable references are provided to 
explore detail. 

Epilepsy is the chosen context because, aside from practical reasons (this 
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is the area that the authors are familiar with), research in epilepsy holds com- 
mon interest for both electrical engineers and neurophysiologists alike. It is a 
neurological condition in which the brain behaves ‘normally’ most of the time 
but occasionally breaks down to an altered state of consciousness known as a 
seizure or fit. Physical convulsions are the most widely recognized form of such 
seizures, but consciousness can be impaired in other forms including halluci- 
nations and black-outs. Epilepsy is particularly well suited for consideration 
because it holds: 


1. High social relevance: Epilepsy occurs in all age groups, with higher 
incidences in infancy and senility. Causes are numerous and include ge- 
netic or developmental abnormalities, trauma and disease — the common 
denominator being malfunction of brain activity. It is estimated that 
196 of the population suffers from an epileptic episode at some point in 
their lives, although epileptic people are only the 0.696 with recurring 
symptoms. About 2596 of epileptics cannot be helped by any drug or 
therapy available today. Perhaps the statistics seem small but epilepsy 
is the most common recurrent neurological disorder today. Assuming 
it affects the 6.5 billion (6.5 x 10?) in the world uniformly, this means 
that 39 million suffer recurring symptoms, and 9.25 million cannot lead 
normal lives because treatment is not available to them. In reality the 
incidence of epilepsy in developing countries is likely to be greater. The 
direct costs of epilepsy treatment are estimated to be about AUD$80 
billion per year world-wide!. 


2. Interest to the neurophysiologist: | Neurophysiologists have been try- 
ing to figure out the causes, mechanisms and treatment of epilepsy for 
millenia. Throughout history virtually everything has been blamed for 
epilepsy. Seizures have been attributed to benign events such as the 
phases of the moon or disappointment in love affairs alike. Most often, 
though, it was the supernatural that was blamed. In ancient Greece 
epilepsy became known as the 'sacred disease' because it was believed 
that seizures were sent from the devil and the associated visions were 
sent by the gods. The word epilepsy was named after the Greek epilep- 
sia, meaning ‘a condition of being overcome, seized or attacked’. In 
Roman times epilepsy was known as passio caduca — ‘falling sickness’ 
or ‘falling evil’. The stricken were condemned as sorcerers. Treatments 
were frivolous or religious, and it was inappropriate preparation or impu- 
rities of the mind that were often blamed for their failure to cure the dis- 


1 This figure is based on the extrapolation of findings in [16], which estimated the direct 
costs (medical, drug treatment, surgery, etc) on the epileptic population of Australia based 
on a survey performed in 1989. Cost per patient per year was estimated to be roughly 
AUD$2000. It is understood that costs of medication and treatment — as well as the 
availability of treatment — vary significantly between countries, and this figure is only used 
to give an idea of the economic burden. No world estimate of the cost of epilepsy could be 
found, but the figure seems a reasonable average of the international comparisons found in 
[84]. Indirect costs such as loss of productivity are not included in this estimate. 
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order. It was not until the 17th century that all aspects of epilepsy were 
attributed to brain malfunction, initiating the change toward finding the 
differences between an epileptic and a healthy human brain. Even so, 
scientific testing and classification of the many forms of epilepsy did not 
eventuate until the 19th century when conclusions were finally based 
on controlled experiments and repeated observations [125, 180]. Today 
modern technology has allowed a greater understanding of the processes 
of epilepsy, but much remains to be discovered. 


3. Interest to the engineer: Engineers are drawn to the study of epilepsy 
because much of the processing can be performed on measurements (such 
as the EEG, described later) without the explicit involvement of wet 
labs. Abundant routine medical data of this nature exist and the bur- 
den to design new experiments is lessened. Thus although a lot of new 
information must be absorbed before the engineer can fully grasp the 
problem, the learning process can be gradual and there is no need for a 
major shift in his/her own practices. In addition, because the brain is 
so complex its analysis provides the opportunity to use state-of-the-art 
tools. Not to be underestimated are the social aspects of the problem 
that provide greater motivation for those who may find typical applica- 
tions such as telecommunications or computer chip development a little 
on the dry side. The same tools that are learned for these other problems 
can be used toward understanding the brain, provided the underlying 
assumptions are revised. 


So, being an appealing project to both engineers and neurologists alike, 
and in addition a socially relevant one, epilepsy is neither short in funding 
nor interested parties. Before delving into specifics some important concepts 
that may help in understanding much of this book must be introduced. It 
may be useful to refer to Figure 1.1 and Figure 1.2, which give graphical 
representation of what follows. 

Throughout this book we will constantly refer to systems and signals?. In 
order to focus the terminology, we start from the notion of a signal as the 
fundamental concept. A signal is a convenient way to summarize or point to 
a collection of measurements. In mathematical terms a signal is a function 
of time, for example an EEG (described in Section 1.2). At each instance 
of time, a voltage (or a collection of voltages if we use multiple electrodes) 
is recorded. The sequence of all the measurements is a signal. We say that 
the signal is scalar valued if we only have one measurement for each time 
index, and that it is vector valued if we have multiple measurements (that is, 
multiple electrodes) at each instance in time. 

The EEG is used to tell us something about a brain; the brain in our 
terminology is a system from which the EEG signal is observed or derived. 
The collection of all possible EEGs from a brain is called its behavior. More 


?We use a ‘behavioral’ terminology as introduced in [136]. See also [104]. 
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generally, a system constrains the signals that may be observed from it, and 
a system's behavior is the collection of all the signals compatible with the 
system. Although hopelessly general at this level, the framework of signals and 
systems is very powerful to understand relationships and interdependencies. 

Signals originating within a system are internal, but a system often needs 
to communicate with entities outside of itself. These external signals are 
known as its inputs and outputs, corresponding to in-coming and out-going 
information respectively. For example, the input to the brain as a whole is 
the sensory information obtained from the environment and the rest of the 
body (sight, hearing, pain, taste, etc). The outputs are the messages the brain 
sends to the body enabling it to move, speak and react. 

A model of a system is an attempt to formalize the way the inputs and 
outputs interact so that behavior may be computed. In a mathematical model 
this involves the construction of equations that describe the behavior of the 
system. Often the model is a simplified representation of the original system 
because much of the detail can be omitted, in the hope that these details are 
un-important in the process of interest. In this way a lot of the complexity is 
removed from a problem whilst the relevant information is retained. To create 
a model of the brain requires understanding of how the components within 
this system work. However, if modeling the brain were an easy task this book 
would most likely not exist. The task can be simplified by first creating models 
of sub-systems that exist in the brain - that is, smaller systems within a larger 
one, like a model of the cortex or the hippocampus which are subsystems of 
the larger system formed by the brain.. Smaller systems may be candidates 
for sub-systems of the larger ones. This is valid so long as the fact that these 
sub-systems belong to a larger one is not forgotten. 

A model may be deterministic or stochastic. In a purely deterministic 
model, once the current conditions are determined, everything about the past, 
present and future of the system is known unambiguously. This is an unrealis- 
tic situation, which in the real world exists ... well, never. A stochastic model, 
on the other hand, allows for fluctuations around its solutions which cannot 
be accounted for before they occur. These are random or stochastic elements 
that can make prediction of the future of this system difficult. Stochasticity 
can exist not only in the model but within the sources of the brain and the 
measurement of these sources, as discussed in more detail later. 

This book is a study of the brain as a system. We measure this system 
using the EEG, and we use these signals to understand its behavior. We also 
create models that describe the brain as seen through this signal. But what 
can the EEG really tell us about what is happening within the brain? A 
typical EEG machine records up to 64 channels, 512 times a second, each 
with 14 bits resolution. This means that the EEG records roughly 


64 x 14 bits/sample x 512 samples/second ~ 10° bits/second. 


Now, assuming there are 100 billion neurons (10!) in the brain, and that it 


XX 


can be divided into cortical columns each containing 10? neurons (see Chapter 
1), then we have a system which can be described with roughly 10° states. 


This means that the EEG gives us ee Dite eoni = 1 bit of information 
per second per state! That is 1 bit of information per second to describe the 
activity of the 10? neurons within a cortical column! This admittedly crude 
analysis already gives us some idea that the EEG is a very blunt measurement 
of a very complex system, and it should come as no surprise that what it can 
tell us about the brain is severely limited. 'The general argument here is 
formalized toward the end of Chapter 6. 

In this book we expand on this very simple number game and explore 
the usefulness as well as the limitations of the EEG. We focus specifically on 
the problems of both seizure detection and prediction. We make an effort 
to provide conceptual information in all aspects of the problem, beginning 
with the physiology and physics involved in the generation and measurement 
of activity, and then use this knowledge to develop strategies to address a 
problem. 

'The text is organized as follows. Chapter 1 summarizes the physiology 
and the fundamental ideas behind the measurement, analysis and modeling of 
the epileptic brain. We introduce the EEG as a measured signal, and explain 
its use in the study of epilepsy. 

Chapter 2 provides an explanation of the type of brain activity likely to 
register in EEG measurements. It expands on qualitative ideas presented in 
Chapter 1 by providing quantitative analysis of the populations of neurons 
that contribute to both scalp and cortical EEG. At the same time the limita- 
tions and the effects that choices made in the recording process have on the 
data are discussed. 

Chapter 3 then provides an overview of how these EEG records are and 
have been analyzed in the past. The applicable engineering and signal pro- 
cessing methods are numerous. The scope is narrowed by concentrating on 
the mathematics relevant to the problem of classification of EEG. Chapter 
4 then deals with using these extracted features to differentiate between or 
classify inter-seizure, pre-seizure and seizure EEG. This material is applicable 
to the detection as well as prediction of seizures. 

Chapter 5 expands on Chapter 3 by concentrating only on the problem 
of seizure detection, that is, the differentiation between the seizure and non- 
seizure EEG. It is a simpler problem than seizure prediction which has been 
in development for over 20 years, but one that nevertheless requires further 
attention. Lack of standardization makes published results difficult to com- 
pare and understand. A broad scope review of present algorithms is applied 
to a common EEG data set (to our knowledge the first time this has been 
done). 

Although suitable for the task of detection, the ideas presented in Chapter 
5 are black bor methods that are data driven and therefore do not provide 
information beyond what is available in the recordings. For more complex 
problems that require understanding of underlying mechanisms of the brain, 
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a survey of the physiologically based dynamic models of brain activity is pre- 
sented in Chapter 6. 

Finally, Chapter 7 addresses the fundamental question: can seizures be 
predicted? Much research assumes that seizures are predictable, yet little 
work has been dedicated to the validation of this assumption. It is proposed 
in this chapter, through the analysis of epileptic activity spanning from 3 hours 
to 25 years, that seizures are predictable but the amount of data required is 
far greater than previously thought (if we rely only on the EEG), and that 
the measurements being used are not suitable. Furthermore, it is proposed 
that the problem of seizure prediction is only likely to succeed if approached 
differently than has been done to date. For example we may find it useful 
to add information from a different type of measurement, or by employing 
physiologically based methods such as those presented in Chapter 6. 

We started as a small group of research engineers at The University of Mel- 
bourne, Australia, working together with neurophysiologists at St Vincent's 
Hospital, Melbourne. Our question was simple: what can signal processing 
contribute to understanding the epileptic brain? We were naturally interested 
in the EEG as a measurement early on — it is so often used clinically in the 
diagnosis of epilepsy that it is not surprising it is also used in many aspects of 
research. As time passed we became increasingly interested in its limitations, 
and were surprised to find how little these are understood. This is how this 
book was conceived, and we hope it is used by engineers and neurophysiolo- 
gists alike to understand this incredibly useful signal. 
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Introduction 


The brain is a very complex system composed of billions of interconnected 
neurons. As observers we understand the brain system from the measurements 
or signals we obtain from it. If the spatial scale at which measurements are 
made is very small, the measurement could pertain to a single neural cell. 
In contrast, at larger scales the measurement pertains to large collection of 
neurons. In Figure 1.1 the issue of spatial scale is illustrated. 

In this book we concentrate on the EEG as the measurement that is used 
to acquire a signal. The EEG records the time-evolving voltages generated by 
brain activity, and is described in Section 1.2. However the measured signal is 
not necessarily the real signal generated by the brain, but its projection onto 
the recording equipment. Thus the measurement process is a system itself, 
with its inputs arising from the outputs of the original system, and its output 
the resultant measured data. This is shown in Figure 1.2. A system as a whole 
is not restricted to the generating system (the brain), but must incorporate the 
measurement system (the EEG). Analysis of the measurements can provide 
insight into function and dysfunction of the original system, so long as the 
recording process itself is well understood. 

This chapter is dedicated to presenting concepts necessary to understand 
the brain as a system. In Section 1.1 the physiology of the brain relevant to 
epilepsy is summarized. Section 1.2 then concentrates on the measurement 
and analysis of this brain activity, and Section 1.3 summarizes how physiology 
and measurement can be turned into a mathematical model of brain dynamics. 
Section 1.4 discusses how the presence of stochastic elements affects EEG 
sources, measurement, analysis and modeling. 

Each section is introductory only because more detail is included in later 
chapters, with the exception of Section 1.1 where we attempt to contain all 
relevant (albeit simplified) physiology. All sections focus on epilepsy and the 
specific problem of differentiation between seizure and non-seizure activity. 
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System 4 


System 1 


EEG 
EQUIPMENT 


FIGURE 1.1: A system is defined by the measured signals and thus the scale 
of the system depends on the scales of activity that significantly affect this 
measurement. In this figure example systems of different scales are shown. If 
a measurement of the brain only involves activity of single neurons, then the 
smallest of these systems may be used. However in a signal such as the EEG 
larger scales must be incorporated, including the effects of the entire head, 
the recording equipment and maybe even environmental factors not shown. 
'The smaller sub-systems may be used to explain the larger system. 


1.1 The Brain and Epilepsy 


The brain is part of the central nervous system (CNS) and is responsible 
for interpreting sensory information received from the environment so that 
humans can behave as humans do. Each region of the brain has its own task 
in this process. Figure 1.3(a) shows a functional de-composition of the cerebral 
cortez — a thin layer approximately 2-3mm thick that covers the entire surface 
of the brain. The different functional regions of the cortex (temporal lobe, 
parietal lobe, occipital lobe, frontal lobe) are responsible for motor control 
as well as cognitive and memory functions. Sensory information is passed 
on to the cortex from a subcortical system known as the thalamus, shown in 
Figure 1.3(b). The thalamus also plays an important role in regulating the 
interaction between different regions of the brain. 

In the order of 10-100 billion densely interconnected nerve cells, called 
neurons, make up the cerebral cortex. How individual neurons work is under- 
stood quite well, but it is the complex ways in which they inter-connect and 
interact that determine brain function. How these networks function is less 
well understood. 

The general structure of these networks is illustrated in Figure 1.3(b) and 
(c). All mammalian brains look roughly like this, although the details vary. 
For example neurons in the cerebral cortex of humans are much more densely 
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FIGURE 1.2: Above is an example of a generating system - the brain — and 
measurement system such as the EEG. In this over-simplistic representation 
of the brain system, signals that act as inputs are integrated together after 
some delay. The integration is the output of the generating system that be- 
comes the input to the measurement system. After amplification, filtering and 
digitization the output of the measurement system is a record of the activity 
in the generating system. Shown also are the internal signals within each 
sub-system. 


inter-connected than in animals, and is believed to be one reason for the 
more sophisticated capabilities of humans. The gray matter in this figure is 
the cortex, folded to form gyri and sulci and containing all the cortical neu- 
rons. Between cortex and subcortex is the white matter — a region composed 
mostly of connections made between different areas of the brain. The major- 
ity of these connections are between different cortical regions, but subcortical 
systems such as the thalamus also communicate with the cortex through the 
white matter. Very few neurons can be found in this region. 


'The way by which the brain works can be described in terms of behavior 
at different spatial scales, traditionally divided into the micro-scopic, meso- 
scopic and macro-scopic. In this book micro-scopic describes behavior at 
small scales (um), encompassing a single or a few brain cells. Macro-scopic 
describes behavior at large scales (cm), spanning whole regions of brain. The 
intermediate scale, meso-scopic, describes behavior of networks of neurons 
spanning millimeters rather than centimeters. In particular the word is used 
to describe cortical columns, believed to be the main functional units of the 
cortex and described in more detail in Section 1.1.2. 


Brain function can also be described in terms of different temporal scales. 
These are directly correlated to the spatial scales because of naturally occur- 
ring conditions in neural dynamics. The micro-scopic scale is generally asso- 
ciated with frequencies above 1000Hz because the mechanisms within single 
neurons are very fast. The meso-scopic scales are associated with activity 
between 10-1000Hz because faster events are negligible relative to the average 
behavior of ensembles of neurons. Activity at the macro-scopic level is associ- 
ated with frequencies in the range of 1-100Hz because spatially averaging an 
even larger number of neurons filters out higher frequencies. These ranges are 
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FIGURE 1.3: A functional de-composition of the entire brain. (a) shows 
how the cerebral cortex is divided into the four lobes, each responsible for 
different cognitive and motor functions. (b) is a cross-section of the brain 
showing some of the major subcortical systems, the most important for this 
book being the thalamus whose principal role is to relay sensory information 
onto the cortex. Also note the relative size of gray matter — where the neurons 
are — and the white matter — used for connections between sub-systems. In (c) 
the flow of information to and from one cortical column (described in Section 
1.1.2) and other regions of the brain is shown. Notice that the inputs to this 
column come from other cortical columns as well as sub-cortical systems. Most 
connections are projected through the white matter, although nearby columns 
are connected more directly. The graphic in (b) is a modified reproduction of 
a figure taken from Anatomy of the Human Body ([56]), originally published 
in 1918 and lapsed into the public domain. 
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not rigid, but are used as a guideline in the expected behavior, as explained 
in more detail in Chapter 2. 

Epilepsy is a problem of scale. On the one hand it is a macro-scopic phe- 
nomenon that encompasses a large portion if not the whole of the brain. On 
the other hand, epilepsy's root cause must be found in the chemokinetic pro- 
cesses that may be associated with meso- or micro-scopic cellular biology. In 
addition, understanding the macro-scopic may not be possible without knowl- 
edge of some of the micro-scopic. The relevant information must be retained. 
For example, knowing that gating mechanisms that transfer information be- 
tween neurons is important for the understanding of macro-scopic brain ac- 
tivity, but knowing the individual complex protein structures involved in the 
different gate types is unlikely to help us understand macro-scopic recordings, 
and this complexity can be omitted for the purposes of this problem. 

'The remainder of this section provides a crash course into how the brain 
works at each scale, limited to the mechanisms that are believed to contribute 
to the understanding of epilepsy at the macro-scopic scale. ‘Believed’ is used 
here because epilepsy is not completely understood, and elements that are at 
this point thought more or less irrelevant may become relevant in the future. 
Wherever possible efforts are made to avoid superfluous use of medical jargon 
so that important points are not lost in the translation process. Of course 
some terminology is always necessary. In any case, as this introduction is by 
necessity brief and limited to the bare minimum needed to proceed with an 
understanding of the EEG system, readers are encouraged to obtain a detailed 
understanding of cellular mechanisms. An excellent introductory text is [80], 
although any other basic physiology book may be used. 


1.1.1  Micro-Scopic Dynamics: Single Neurons 


It is the neurons in the CNS that are responsible for the processing and 
transmission of information, but they are not the only type of cell present. 
Glial cells in the cortex outnumber neurons by just under 4 to 1, but their role 
is not as clearly understood. They are believed to be responsible for support 
roles such as provision of structure, insulation and maintenance [80]. More 
recent studies reveal that the role of glial cells may not be so passive, but in 
any case their contributions to neural function are more or less ignored. Since 
it is likely that their complete function will continue to be unknown for some 
time they are largely ignored in this text, although one should remember that 
they exist. 

Figure 1.4(a) shows a picture of a stained cortical slice in which many 
neural cells are visible. These are known as pyramidal neurons and are the 
most common nerve cells found in the cortex. A structural decomposition of 
a typical pyramidal neuron is shown in (b). Neurons come in many different 
shapes and sizes, but they are composed of four basic structures — dendrites, 
soma or cell body, axon and synaptic terminals. 

The inputs to a neuron are chemical currents (electrically charged) that 
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FIGURE 1.4: A typical pyramidal neuron found in the cerebral cor- 
tex. In (a) is a stain obtained from cortical tissue which shows 
about 196 of neurons in the region, taken from brainmaps.org and 
reproduced freely using the Creative Commons Attribution 3.0 license 
(http: //creativecommons.org/licenses/by/3.0/). (b) is a structural de- 
composition of neurons, showing the input regions at the synapses on the 
dendritic tree. These inputs are integrated at the soma or cell body and result 
in an output on the azon. 


occur at the synaptic terminals spread across the dendritic tree. These cur- 
rents are transmitted down to the cell body, which then accumulates all these 
inputs together. If the integration reaches a threshold voltage, then an action 
potential — another chemical current much larger in magnitude than any sin- 
gle synaptic input — is fired and travels through the axon. If this threshold 
is not reached, no action potential occurs. Action potentials are the output 
of the neuron, which in turn closely approach or synapse to the dendrites of 
other neurons, and thus become the input to the receiving cells. A system 
representation of a single neuron is shown in Figure 1.5. Here the role of the 
cell body is more than that of a pure integrator of dendritic inputs as it is 
also responsible for restoring charge concentrations in the extra-cellular fluid. 

Repeated crossing of the threshold voltage at the soma can generate mul- 
tiple action potentials at a maximum rate of about 1000Hz. Action potentials 
travel at about 5 — 10 meters per second [146] and do not deteriorate along 
the axon due to built in regenerative processes. The shape of the voltage dif- 
ferences created by a typical action potential between the inside and outside 
of the axon is shown in Figure 1.5(c). 

Synaptic transmission is the process by which action potentials arriving at 
the end of the axon of a transmitting (pre-synaptic) neuron are interpreted by 
the dendrites of the receiving cell (post-synaptic neuron). The pre-synaptic 
cell releases chemicals known as neurotransmitters that control the response 
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(c) Neuron output: Simulated action potential 


FIGURE 1.5: In (a) the features of a real neuron shown in Figure 1.4(b) 
are used to construct an equivalent system model in which the inputs and 
the outputs are defined. The functionality of the cell body system can be 
described by mathematical equations that dictate how the outputs (e.g., action 
potentials, restoring charge concentration in extra-cellular fluid) react to the 
given inputs. In (b) and (c) are examples of how a Hodgkin-Huxley model 
equations (a system in the form of (a)) can be used to simulate input and 
output signals. These are only representative waveforms; much variability 
exists in their shape. The input (in this case an excitatory post-synaptic 
potential or EPSP, in (b)) is much longer in duration than the output (an 
action potential, in (c)), but also much smaller in amplitude. Many PSPs 
must be integrated at the soma for an action potential to be fired. 
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of the post-synaptic cell. The type of neurotransmitter released varies but 
these chemicals are responsible for generating a post-synaptic potential (PSP) 
that can fall in one of two categories: 


e Excitatory post-synaptic potential (EPSP): The response at the synapse 
is much like a miniature version of an action potential in the post- 
synaptic cell, except longer in duration and smaller in amplitude. An 
EPSP is shown relative to the action potential in Figure 1.5. The gener- 
ation of an EPSP brings the soma closer to threshold and thus increases 
the likelihood that an action potential is fired. 


e Inhibitory post-synaptic potential (IPSP): Opposite in effect to an EPSP. 
The probability of a resultant action potential decreases. 


A single IPSP typically has a larger effect than a single EPSP because 
inhibitory synapses tend to form closer to the soma. However the total number 
of EPSPs is greater than IPSPs and thus the effects level out. The balance 
of incoming IPSPs and EPSPs on a single neuron determines whether the 
post-synaptic cell fires an action potential — it is relative numbers of active 
EPSPs versus IPSPs that matter!. 

Action potentials and PSPs are possible only because of the cell membrane 
— the material that separates internal fluids from the cerebrospinal fluid (CSF) 
surrounding the cell. Under normal conditions the cell membrane is imper- 
meable to fluids. Action potentials and PSPs occur because of the presence of 
gates that selectively allow the transmission of ions through this membrane. 
These gates open via different mechanisms. One example are the voltage-gated 
ion gates that react to differences in voltages between the inside and the out- 
side of the neuron. When threshold is reached the gates open to allow ions to 
flow, and the generation of an action potential ensues. At the synapses the 
gating triggers are chemical and it is the presence of a neurotransmitter that 
allow the gates to open. Many other different types of gates exist, includ- 
ing in-built mechanisms that restore chemical balance after a gate is opened. 
These types are omitted here for simplicity, and synaptic transmission is sim- 
plified to the promotion or inhibition of a resultant action potential. More 
information can be found in [80]. 


1.1.2 Meso/Macro-Scopic Dynamics: Neural Networks 


In the cerebral cortex two types of neurons form 90% of the total population: 
the pyramidal cells shown in Figure 1.4 and others known as inter-neurons. 
The axons of pyramidal cells form only excitatory synapses with other neurons, 
and the inter-neurons only inhibitory ones. The massive interconnectivity 
between these means that a single neuron may be transmitting and receiving 


lThis is a simplified picture of how neurons fire. In reality, depending on the type 
of neurons, many different behaviors can be observed in response to EPSPs and IPSPs, 
including single, multiple, or burst firing. 
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FIGURE 1.6: The layered structure of cortical columns. (a) shows a reproduc- 
tion of a stain demonstrating the stratified manner in which a typical cortical 
column is organized. Each layer has different neural densities and types. In 
(b) the cortico-cortical and thalamo-cortical projections are shown. The lay- 
ered structure is believed responsible for the organization of the inputs and 
outputs of the cortex. Notice the high levels of redundancy built in — the 
thalamus may communicate with a column directly, or indirectly through an- 
other column. Strong connections also exist between the layers of the column 
but these are not shown here. In practice, a cortical column is often treated 
as a structurally uniform system with the same inputs and outputs as shown, 
but without specification as to where along the column these signals connect 
to. 


information to and from thousands of other neurons. These may be in the 
order of 1,000-10, 000 synapses in the dendritic tree of a pyramidal neuron, 
its axon projecting to a similar number [168]. For this reason, no single EPSP 
or IPSP is the determining factor for a post-synaptic action potential. The 
cell body integrates all incoming signals. 

Because the connections determine behavior in the brain it is worthwhile 
understanding a little about its structure. Looking at the cortex, Imm? of 
tissue contains ~ 50,000 neurons and in the order of ~ 3 x 10? synapses, 
8496 of them excitatory and the rest inhibitory [168]. About 7096 of cortical 
neurons are excitatory pyramidal neurons, 20% inhibitory inter-neurons and 
the remainder other types that are most often also excitatory. The under- 
representation of inhibition in the histology of a typical neural tissue sample 
should not be interpreted as a lower level of inhibition because the effects 
of IPSPs can be much larger than EPSPs. In any case the statistics are 
approximate and vary with location. 
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'The numbers alone do not draw a complete picture — the organization of 
the cortex is not at all random. The volume of 1mm? used above is representa- 
tive size of what is known as a cortical column — a compartment within which 
all neurons are connected to almost every other neuron, but with relatively 
few connections projecting outside of it. A cortical column is believed to be 
the basic functional unit of the brain, although in reality they are not discrete 
volumes with sharp boundaries. 

Connections between cortical columns are called cortico-cortical projec- 
tions. To a lesser extent connections are also made to subcortical networks, 
most importantly the thalamus responsible for relaying sensory information 
to and from the cortex, among other things. These are called thalamo-cortical 
projections. Cortico-cortical projections outnumber thalamo-cortical projec- 
tions by a ratio of 100 to 1. Figure 1.7 shows the possible connections made 
between a cortical column and thalamus. 

Functionally the existence of cortical columns that can act relatively au- 
tonomously allows parallel and therefore faster processing of information [149]. 
Simulations suggest that one of the central roles of the thalamus is to syn- 
chronize and de-synchronize the activity between multiple cortical columns 
so that they can work together or independently on any given task. This 
synchronization is achieved through rhythmic inputs to the relevant columns. 
In this way the functionality of the brain can be re-organized dynamically 
depending on the need [68]. 

On a global scale, the resultant network formed by thalamo- and cortico- 
cortical projections must maintain a close balance between excitation and in- 
hibition. Too much inhibition and the brain cannot work, too little and neural 
activity gets out of control. Epilepsy, discussed in Section 1.1.4, is one form 
of such loss of control. Because projections between regions of the brain play 
an important role in brain function they are discussed next. 


1.1.2.1 Cortico-Cortical Projections 


Even within a cortical column the neurons are not organized randomly. Figure 
1.6 shows that 6 layers can be identified in the 2mm thick cortex. The type 
and number of connections within each layer differ. The overall structure, 
though, can be said to be roughly responsible for the organization of inputs 
and outputs of the column [80]. A rule of thumb is that cortico-cortical pro- 
jections occur predominantly in the top layers, while subcortical projections 
occur mostly in the deeper layers. Although the structural de-composition 
of a column is relatively well understood, this level of detail is not necessary 
in macro-scopic dynamics and cortical columns are treated in terms of their 
inputs and outputs, irrespective of where these occur. 

Both excitatory and inhibitory projections form the cortico-cortical con- 
nections, but in general inhibitory projections remain relatively local. Cortico- 
cortical projections are most dense between columns that are nearby, but an 
incredibly large number of long-range and intra-hemispheric projections also 
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FIGURE 1.7: Thalamo-cortical loop. The figure shows the connections that 
are possible within two sub-systems — a cortical column and the sub-cortical 
thalamus — as well as the connections between them. Notice that each sub- 
system contains one inhibitory and one excitatory mechanism (TRN and SRN 
respectively in the thalamus) and that it is the excitatory parts that project 
out of each sub-system. Inputs, on the other hand, connect to both excitatory 
and inhibitory components. These long-range projections must traverse the 
white matter and thus experience delays in the order of 20 — 50ms that do 
not affect local internal signals of each sub-system. External inputs are also 
shown in this diagram. 


exist. The main constituents of the white matter in Figure 1.3(b) are the 
fibers that form cortico-cortical connections. A single axon extending from a 
pyramidal neuron synapses with multiple other cortical columns. 


Most cortical projections are reciprocal, that is, if a column has synapses 
that connect to neurons in another column, then more than likely this second 
column also projects axons to the first. Thus the major input to any cortical 
area is from other cortical areas, and 7596 of all synapses in the cortex are 
from one pyramidal neuron to another [21, 168], ocurring locally or through 
long-range connections. 


1.1.2.2 Thalamo-Cortical Projections 


Only about 196 of the fibers in the white matter are projections to and from 
subcortical structures; thus the input to a cortical column from the thalamus 
is far smaller than that from other columns [168]. However the manner in 
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which these connections occur suggests that this small input is amplified by 
local feedback mechanisms. 

In any case the role of the thalamus in the functionality of the cortex is not 
to be under-estimated. If the thalamus were simply a way to relay sensory 
information to the cortex then its presence would be somewhat redundant. 
Amongst other things, the thalamus is also responsible for the regulation of 
information to the cortex, possible because of the structure of the thalamo- 
cortical loop shown in Figure 1.7. Again it is only excitatory synapses that 
project to and from the cortex. The excitatory synapses projecting from 
cortex to thalamus also synapse with inhibitory neurons in the thalamic retic- 
ular nucleus ((TRN). The TRN is reciprocally connected to the sensory relay 
nucleus (SRN), responsible for interpreting and transmitting sensory input. 
This inhibitory loop is a way that the cortical activity itself can be used to 
regulate how much of the information incoming into the thalamus from the 
environment is relayed onto the cortex [169]. 

Signals traveling along thalamo-cortical projections experience delays in 
the order of 20 — 50ms [146]. 


1.1.3 Neurotransmitters and Neuromodulators 


The chemistry in the brain is a lot more complex than has been alluded to in 
previous sections. Chemical interaction with the physiology has been reduced 
to either excitatory or inhibitory on the premise that it is the effects that are 
important rather than the names and particular mechanisms related to each 
substance. However a brief word on the different types of chemicals involved 
in brain function is necessary. 

'The neurotransmitters in the brain can be either fast acting or slow acting. 
Fast acting neurotransmitters are the ones responsible for local inhibition 
and excitation, and their concentrations are important in the generation of 
signaling at the local neural network level. Complex mechanisms that regulate 
the balance of chemicals around the cells exist — the level of chemical has to 
be such that once an ion gate opens the neurotransmitters are appropriately 
re-absorbed. If insufficient neurotransmitter is released, the gate will not 
open. If it is not re-absorbed the gate will not close. The neurotransmitter is 
restricted locally to a particular synapse because if it is allowed to spread too 
far then other synapses may be activated. 

In the human brain, the neurotransmitters most commonly responsible 
for excitation and inhibition are glutamate and gamma-aminobutyric acid 
(GABA) respectively. However GABA is known to also cause excitation de- 
pendent on location in the brain [27, 89], again emphasizing that in order 
to understand the brain it is best to talk about the effects rather than the 
chemicals themselves. 

'The slow acting neurotransmitters are referred to as neuromodulators be- 
cause they are responsible for modulating the activity in a given brain region. 
They control the background activity or background state in this region rel- 
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ative to nearby regions. Neurotransmitters then work locally to generate ac- 
tivity in the context of the state provided by the more diffuse mechanisms of 
neuromodulators. Furthermore the release and re-uptake of neurotransmitters 
at the synaptic clefts occur on a shorter timescale than neuromodulators. 

Although the complex mechanisms of neuromodulators are not clearly un- 
derstood one of their roles is believed to control the level of coupling between 
cortical columns. When these columns behave similarly the brain activity is 
more global, slower and the overall parallelism in processing is reduced. Sleep 
is one example where the brain works in a more global way because the neces- 
sity for complex processing is reduced. When columns act more independently 
more tasks are possible at once. This is associated with levels of increased 
alertness. 

Example neuromodulators include noradrenaline and dopamine, which de- 
couple cortico-cortical interactions through inhibitory mechanisms and pro- 
mote local interactions to allow increased parallel processing. Serotonin, on 
the other hand, suppresses local activity (again through inhibitory mecha- 
nisms) and promotes interaction between cortical columns. Noradrenaline 
and dopamine are found in highest concentration during alertness, whilst the 
opposite is true for serotonin [120]. 

If any of the mechanisms that regulate the concentration, release and 
uptake of chemicals breaks down then neurons and networks do not function 
correctly. Thus it is not only the physiology of the neuron that must remain 
intact, but also the mechanisms that maintain chemical balance. Malfunction 
in any of these can cause pathological behavior at the micro- or macro-scopic 
scale, depending on the type and severity of the malfunction. 


1.1.4 Epilepsy — A Malfunctioning Brain 


Epilepsy is a condition resulting from underlying physiological abnormalities 
in which seizures represent, to varying degrees, an infrequent phenomenon? [37]. 
'The symptoms of epilepsy are very diverse and these depend on the region 
of the brain that is affected. An abnormality may cause seizures via the 
same mechanisms, but affect the epileptic person differently depending on its 
location [166]. What these mechanisms are, or what causes the onset and 
subsequent termination of seizures, is not well understood. 

Physiologically speaking seizures are, spatially, a large-scale phenomenon 
in which a network rather than a single neuron is necessary to sustain abnor- 
mal activity. Most knowledge to date is derived from experiments on animals. 
Although the applicability of such knowledge to humans is not 10096 under- 
stood [37] these experiments have identified the minimum conditions under 


2The word ‘seizure’ is often replaced with ictal, so that pre-seizure and pre-ictal, seizure 
and ictal, post-seizure and post-ictal are all equivalent terms. This use of terminology is 
avoided throughout this text. Furthermore, the onset of a seizure must not be confused 
with the terms epileptogenesis which refers to how the epileptic disorder began (e.g., mal- 
formation, trauma, aging) as opposed to how individual seizures start. 
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which a network of neurons can sustain seizure-like activity. The network 
must [73] 


1. Be part of a sufficiently large population (at least thousands of neurons), 
2. Be relatively densely inter-connected, and 


3. Have excitatory synapses that for the most part function correctly. 


These conditions are not a result of the epilepsy itself, but instead exist 
naturally to sustain everyday activity. Seizures are the result of an abnormal- 
ity in this network that allows neural activity to become unchecked. Neurons 
become hyper-active and synchronous, that is, they fire action potentials at 
a much higher rate than is normal and at the same time as nearby neurons. 
This abnormal activity spreads or generalizes to other regions of the brain 
via the same mechanisms that allow normal function. Once a relatively large 
proportion of the brain is involved the hyper-active synchronous firing con- 
tinues in a waxing-waning manner resulting from resonances that occur due 
to the spatial distribution of network interconnections in the brain. These 
resonances can be linked to the geometry of the head. 

'The offset or post-seizure period is also important for a complete under- 
standing of the epileptic brain — how the brain is capable of transitioning 
out of a seizure is from a dynamical systems perspective just as relevant as 
the transition into seizure. Most common is the belief that a seizure ends be- 
cause oxygen supply to the neurons is depleted, but other mechanisms such as 
changes in chemical concentrations (e.g., accumulation of adenosine) or where 
a chemical imbalance responsible for initiating the seizure restores itself are 
also possible. 

'There are so many different types of epilepsies with different causes, symp- 
toms and effects, that it is difficult to categorize them. A system proposed in 
1981 (the International Classification of Seizure Types, [1]) is based on obser- 
vation of clinical phenomena (explained in Section 1.2) as opposed to the un- 
derlying pathology. The system involves more than 15 different sub-categories, 
but only three broad classifications are made: partial or focal seizures, gen- 
eralized or non-focal and continuous. The basic assumption in this book is 
that the epilepsies within each group are generated by similar mechanisms, 
and differences are nuances easily explained within each paradigm. The three 
categories are discussed next. 


1.1.4.1 Focal Epilepsy — Failure of Meso-Scopic Networks 


Focal or partial epilepsy is caused by an abnormality in a specific part of the 
brain, usually in the form of a group of damaged or abnormal neurons [80]. 
Seizures do not necessarily start at the focus, but are the result of its presence 
and its influence on the network as a whole. If a focal seizure spreads to 
a large proportion of the brain it is said to become secondarily generalized. 
The initiation and spread of such seizures are believed to be a progressive (as 


Introduction 15 


opposed to an abrupt) transition. This transition may sometimes seem abrupt 
because the spread occurs very quickly. 

A review of histology, that is, how the cellular and network properties of 
neurons in the focus vary from ‘normal’ brain, is presented in [12]. They 
discuss that the abnormal tissue can vary in morphology (e.g., enlarged neu- 
rons), connectivity (e.g., abnormal distribution of dendritic connections, of- 
ten reduced inhibitory synapses) and excitability (e.g., abnormal neurons are 
more prone to fire action potentials, or their responses are larger than in ‘nor- 
mal' neurons). However the work they have presented is limited to cases for 
which samples are available from surgical removal, restricted to those epilep- 
sies that are both resistant to medication and operable (see Section 1.1.5). 
Often it is not possible to know if the observed abnormalities are the cause 
or the effect of recurrent seizures. Furthermore the acquisition of ‘normal’ 
tissue samples from healthy humans is unethical and controlled comparisons 
are scarce. The abnormalities listed here are thus only examples and possibly 
un-representative of all epilepsies. 

In any case when the brain is not seizing these neurons cannot usually 
participate in normal activity and are maintained in control by strong inhibi- 
tion provided by surrounding neural networks [37, 166]. The abnormal levels 
of inhibitory activity suggest that the epileptic brain behaves differently from 
a ‘normal’ brain even between seizures [91, 37]. Seizures are the result of a 
breakdown in the mechanisms that maintain the epileptic neurons in check 
[118]. Once control of these abnormal neurons is lost they become hyper- 
active, firing massive bursts of action potentials that can recruit and entrain 
otherwise healthy neurons from nearby or remote areas. 

'To understand focal epilepsy two processes must be explained: 


1. Initiation: What makes the focus become epileptic? Or in other words, 
what causes the inhibitory mechanisms to fail and allow runaway ac- 
tivity? Is it a network phenomena, a chemical imbalance or perhaps a 
network phenomena that leads to a chemical imbalance? The abnormal- 
ities in the focus must have something to do with this, probably in the 
balance between their increased excitability and the failure of inhibitory 
mechanisms. Alternatively it has been suggested that the inhibitory 
mechanisms themselves can cause synchronization in the abnormal neu- 
rons, thus the inhibitory process may, at least in some epilepsies, be 
responsible for both the control and the cause of the seizure [12]. Un- 
derstanding the initiation of a seizure may lead to the ability to predict 
its onset. 


2. Generalization: How does the seizure spread? It is known that the 
spread is the result of the inter-connectivity that allows the brain to 
function normally, but understanding the spreading mechanisms could 
potentially be used to provide more appropriate treatment. Other ques- 
tions ensue, for example: is the role of the thalamus passive in that it 
simply mediates faster spreading of activity, or is it actively involved in 
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the seizure? From a clinical perspective it is important to understand 
how much a seizure must spread before it is detectable?. 


The effects of focal epilepsy can vary. Simple seizures are those that cause 
relatively mild cognitive, psychic, sensory or autonomic symptoms and no 
interruption to consciousness, whilst complex seizures are those that spread 
to cause altered states of consciousness, complex automatic or convulsive be- 
havior. The location of the focus affects the experience of the seizure. For 
example if the focus is centered on the region of the brain responsible for 
smell, then this sense is affected during a seizure. Temporal lobe seizures are 
known to produce mystical experiences. 

Some focal epilepsies appear non-focal because traditional imaging can- 
not locate the abnormalities. However others are known to be non-focal, as 
discussed next. 


1.1.4.2 Non-Focal Epilepsy — Failure of Macro-Scopic Networks 


Primarily generalized or non-focal epilepsies have seizures whose onset man- 
ifest across the entire brain immediately!. The time at which the sudden 
change from ‘normal’ to epileptic activity occurs can be likened to a bifurca- 
tion point because at this time the system's behavior becomes, qualitatively, 
very different?. The mechanisms that explain how the brain reaches this bi- 
furcation point are even less understood than for focal seizures. Whereas the 
focus is known to be responsible for these seizures, the explanations sought for 
non-focal epilepsy place emphasis on understanding the origin of the activity 
that leads to the actual seizure: 


1. Initiation: What activity in the brain is responsible for the initiation 
of a seizure? More than likely it is a combination of (a) balance of 
chemistry (neuro-transmitters and neuro-modulators) and (b) physiol- 
ogy (histology, network connectivity, delays and resonances). 


2. Spread and Generalization: If the processes responsible for the gen- 
eration of a seizure are known, what leads the brain to the bifurcation 
point? Is it changes within the brain or external stimulus that drives 
this rapid change? Does the thalamus play an active role? Some believe 
the origin of generalized seizures to be thalamic because it is very well 
connected to all parts of the brain and thus allows for an almost instan- 
taneous spread [80]. Nevertheless these questions remain inadequately 
answered. 


3Many measurements/observables can be use to detect seizures, but in this book we limit 
ourselves to detectability by EEG measurements, that is, a seizure is detectable when it is 
observed in the EEG measurement. The EEG is described in Section 1.2. 

^Focal seizures may sometimes appear primarily generalized but it is believed that in 
these cases the spread is very fast so that they seem instantaneous. 

5Here the term ‘bifurcation’ is used loosely. A more rigorous definition exists in dynami- 
cal systems theory. However, at least qualitatively, it is the same as that in the mathematics. 
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The variety of generalized seizures suggests that their cause is due to in- 
trinsic properties of the brain as a whole rather than a consequence of a subset 
of neurons. For example, absence seizures usually cause a very short interrup- 
tion of consciousness and they appear as a period of vacancy on the epileptic. 
These type of seizures are most common in children and to a large extent dis- 
appear with adolescence. What changes are experienced by the brain during 
this maturation period? Changes in hormones? Re-structuring of connec- 
tions? Which of these could be responsible for abating the seizures? On the 
other hand, other generalized seizures develop with maturation — is it the same 
processes that in some brains stops epilepsy that allow others to develop it? 
Can a brain be said to be predisposed to epilepsy given a histological difference 
to a ‘normal’ brain? If so, what are these differences, and how different does 
à brain need to be to inherit this predisposition? 


1.1.4.3 Continuous Epilepsy 


Also known as status epilepticus, continuing seizures describe a state in which 
there is no observable recovery between seizures. This is a very dangerous form 
of epilepsy that can become life-threatening even 5 minutes after convulsions 
have started because the inability to sustain metabolic demand (oxygen, sug- 
ars) can cause brain damage. More than 30 minutes of convulsions can lead 
to death [80]. 

Since status epilepticus is thought of as a continuing form of either focal 
or non-focal seizures it is not discussed in any more detail here. 


1.1.5 Diagnosis and Treatment of Epilepsy 


Despite the many imaging techniques that allow visualization of brain struc- 
ture and functionality (see, for example, Section 1.2) these alone are not suf- 
ficient to diagnose epilepsy. The presence of abnormal focal tissue does not 
necessarily cause seizures if it is located in a region of the brain that is insuffi- 
ciently connected to other parts of the brain. Activity that resembles seizures 
may not be epileptic in origin. Such data must be coupled with a patient's 
medical history, including genetic predisposition as well as the nature of the 
symptoms so that a diagnosis may be made. The symptoms themselves (e.g., 
smell, dizziness, motor impairment) may provide physicians with information 
about the location of a seizure focus. Often it is necessary to induce a seizure 
— in these cases they are provoked by applying stress to the patient in the form 
of hyperventilation, sleep deprivation or, when appropriate, flashing lights. 
Once diagnosed treatment can begin®. The most common form of treat- 


Throughout history treatments have ranged from the mystical (black hellebore, oak 
mistletoe, valerian) to the strange (swallowing the heart of a rattlesnake, wearing the head 
of a cuckoo around the neck) to the downright ridiculous (sleeping over a cow stable) 
[125]. Today treatments are based on scientific experimentation although a good element 
of experience remains in the way that it is administered. 


18 Epileptic Seizures and the EEG 


ment is pharmacological. Fixed doses of anti-epileptic medication adminis- 
tered daily are used to control seizures. This works to varying degrees for 
two-thirds of the epileptic population. Surgical removal of the damaged neu- 
rons in focal epilepsy may be suitable for a further 896 of patients but can 
result in irreparable damage to other brain functions. This leaves roughly 
a quarter of sufferers with no viable options and for whom other forms of 
treatment such as electrical stimulation are being investigated. 


1.1.5.1 Anti-Epileptic Drugs 


Anti-epileptic drugs (AEDs) attempt to curtail epilepsy by affecting the chem- 
istry driving the cellular processes underpinning epilepsy. The problem with 
AEDs is that it is difficult to tailor drugs to suit the particular circumstances 
of the individual case when the actual chemical origin of the epileptic con- 
dition is not well understood. Moreover drugs necessarily affect the entire 
brain, and have an impact beyond the brain. Their side effects are many, 
and varied, and cannot always be predicted [133]. To overcome the many side 
effects, AEDs that can be localized in administration and are fast acting with 
a limited time window of effectiveness are preferable. However at present such 
AEDs are purely experimental. 

The effects of most drugs are complex and vary with age, medical history, 
genetic background as well as the type of epilepsy. As a result, achieving the 
correct dosage level is often based on empirical knowledge and administered 
through trial and error (under tight medical supervision). Drugs that have 
shown success on a wide scope of epilepsies are tried first, with increasing 
dosages. If a drug proves inadequate a new drug is tested, at different con- 
centrations, and so the process goes on. Each patient requires a tailor-made 
solution, and dosage adjustments should only be made when the clinical need 
exists [133]. The correct cocktail of AEDs must balance the reduction of 
seizures versus the resultant side effects (e.g., dizziness, weight changes, cog- 
nitive impairment, rashes, etc) which can sometimes impair lifestyle almost 
as much as the seizures themselves. Until the mechanisms of epilepsy are 
more clearly understood, or fast-acting drugs can be delivered only to the 
regions of brain that are involved, the administration of AEDs continues to 
be a complicated process. 


1.1.5.2 Surgical Resection 


If more than three AEDs fail to lessen the number of seizures significantly 
the patient is medically refractory and more drastic measures are considered. 
For focal epilepsies this could mean the surgical removal or resection of the 
epileptic focus so that the damaged neurons are no longer involved in the 
network activity. This may appear like a radical concept but it is certainly 
not a new one — stone age cave paintings have been found in France that 
suggest that primitive forms of cranial surgery were used as treatment even 
then. Clearer records exist for medieval times where burning of the back of the 
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head (cauterization) was sometimes applied to healthy children to prevent the 
disorder [125, 180]. Today mesial temporal lobe epilepsy is the most common 
adult epilepsy known to be particularly resistant to drugs, and thus is the 
epilepsy for which most surgeries are performed. 

It is amazing that such a large amount of redundancy is built into the brain 
so that removing a significant proportion can have relatively minor effects. 
Some may argue that epileptic foci are not used by the brain; thus their 
removal most likely alleviates the burden of maintaining control, but there 
are severe cases in which an entire hemisphere is resected and motor control 
is re-learned by the remaining half. In any case, even the smaller surgical 
procedures are risky and are only carefully considered when no other option 
exists. 


1.1.5.3 Electrical Stimulation 


When neither AEDs or surgery can help it is a desperate situation for the pa- 
tient and alternative forms of management are necessary. Attention is shifting 
to the application of electrical stimuli to control or abort seizures. This may 
seem un-intuitive given the well known effects of electricity to promote rather 
than to reduce seizures [32], but certain cases have proven contrary to this. 

Vagal nerve stimulation (VNS) is the most common procedure to date, 
used to control epileptic activity. Again a recurring theme is that the mecha- 
nisms of operation are not understood, and the rationale behind VNS is that 
since 9096 of nerve fibres in the vagus nerve project toward the brain then 
activity can be indirectly modulated with the applied stimulus [9, 116]. The 
reason why these changes may be anti-epileptic is not known, but correlates 
between blood flow to the thalamus and efficacy have been observed [116]. 
Speculation as to the indirect activation of inhibition through the release of 
seratonin also exists [9]. No significant changes are observed on the electrical 
activity in the brain when the VNS is active, even though these changes are 
obvious in animal experiments [116]. 

The first human VNS implant occurred in 1988, and since then hundreds 
of thousands of procedures have been performed. How effective the implant 
is remains vague — double blind tests with control groups comparing seizure 
frequency before and after implantation vary significantly. An observed trend 
is that in the short term a 5096 reduction of seizures is observed in 20-3096 
of cases, and this number increases to 40-50% in the long term [9, 116]. The 
implant remains effective years after activation [20]. Seizures cease completely 
in only 1-2% of cases [9], whilst no effects are observed in about 35% of cases 
[107]. These figures must be interpreted carefully because the epilepsies under 
test are by definition those that have proven particularly difficult to treat. 

VNS stimulation is an ‘always on’ treatment that may affect a large region 
of the brain, not necessarily the required part. The long-term clinical conse- 
quences of this are seemingly inconsequential, but unknown nonetheless. The 
negative side-effects of VNS (e.g., sore throat, headache) are relatively minor 
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and decrease significantly with time. Positive side-effects such as mood and 
alertness elevation [38, 116] and quality of life improvements [107] are seeing 
VNS used in the treatment of other disorders including depression. The major 
drawback of VNS is that it is not known how it works let alone if it will work. 
In any case VNS is a form of seizure reduction rather than elimination. 


'The direct application of electrical current to subcortical structures of 
the brain, also known as deep-brain stimulation (DBS), is being reviewed for 
clinical application today. The structures under study include the cerebellum 
because it is known to promote inhibitory activity, the thalamus because of its 
wide-spread connectivity and the hippocampus for its involvement in temporal 
lobe epilepsy [116]. Studies show comparable performance to VNS. 


'The question of how the stimulus is applied is open for debate. Both VNS 
and DBS traditionally use continuous stimuli in an on-off manner (e.g., slow 
mode: 30 seconds on, 3-5 minutes off, rapid mode: 7 seconds on, 12 seconds 
off [116]). Studies considering the frequency and shape of the stimulus are 
rare, most likely because tests are again empirical rather than based on an 
understanding of the epilepsy that allows optimal selection of parameters. 


A recent shift in research tries to abort seizures rather than control them 
by applying the stimulus only once the seizure has begun. The main advantage 
of this is that continuous stimulation is not necessary, side effects are mini- 
mized and battery life is significantly extended. The problem is that reliable 
detection or prediction of these seizures to initiate the treatment process is 
very difficult, as discussed in later sections, and that it is not known whether 
it is possible to abort the seizure once it has begun. Promising studies show 
that seizure duration can be altered, but much research must be completed 
before this can be clinically evaluated [189]. 


1.2 The EEG - A Recording of the Brain 


'To obtain much of the information required to diagnose epilepsy the electroen- 
cephalogram (EEG), literally meaning 'an electrical recorder of what is inside 
the head', has proved invaluable. It provides a measurement of the electric 
activity in the brain, translating the chemical currents into voltage recordings. 
It has high temporal resolution in that it is able to characterize fast changes 
in current flows, but poor spatial resolution because measurements are lim- 
ited by the number of electrodes, their placement and properties of the head. 
Voltage recordings were first demonstrated on monkeys in 1875 by British 
neurophysiologist Richard Caton, but the practice did not become clinically 
viable until the 1920s when un-invasive practice became possible [103]. 


Measurements can be made at different spatial scales. Macro-scopic records 
are obtained (relatively) un-invasively from the scalp, or through surgical pro- 
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cedures that allow recording from within the head, that is, intra-cranially. 
Each has its advantages and disadvantages: 


1. Scalp EEG: The signal must propagate through several layers of non- 
neural tissue — namely cerebrospinal fluid, skull and scalp — each affect- 
ing it in different ways. Recordings at the scalp are heavily attenuated 
and much larger regions of the brain must be actively synchronized for 
an EEG signal to register. The procedure is easy and inexpensive and 
used as a diagnostic tool, sometimes capable of providing sufficient in- 
formation, but at other times as a preliminary step to more detailed 
intra-cranial records. Standard electrode positioning systems exist to 
make records more comparable within and between patients. The in- 
ternational 10-20 system for electrode placement is explained in Figure 
1.8(a). 


2. Intra-cranial EEG: Records can be taken from cortical electrodes (those 
placed on the cortex) or depth electrodes (those that penetrate to subcor- 
tical systems such as the thalamus). Smaller spatial scales are recorded 
by the intra-cranial EEG, but the phenomena is still macro-scopic or 
at least meso-scopic’. Intra-cranial records are often obtained for pre- 
surgical analysis to determine regions of the brain to be resected. Such 
procedures are relatively rare and data of this nature are more difficult 
to obtain. Standardization of electrode placement is also more difficult 
because decisions are made on a patient-by-patient basis. 


In all cases the analog signals generated by the brain are passed to inter- 
facing machinery responsible for amplifying, filtering and digitizing the data 
before relaying it to a computer for storage and analysis (see Figure 1.2). The 
technical specifications of the machinery play a role in the integrity of the 
recorded signal. One recording system is presented in Chapter 5. The posi- 
tioning of the electrodes and the referencing system used play a critical role 
in determining the EEG. Two referencing systems for scalp-recorded EEG are 
shown in Figure 1.8. 

For the most part, the EEG measures the potential differences induced by 
currents flowing due to EPSPs and IPSPs — not action potentials. This is 
because the total field potential of a group of neurons more or less equals 
the sum of the field potentials of individual neurons. Action potentials are of 
shorter duration and do not overlap as much as EPSPs and IPSPs [118, 122]. 
Many thousands of neurons (10* — 107) must behave in synchrony to generate 
large enough signals that register on the EEG. This is particularly true for 
scalp recording where propagation through to the scalp attenuates and filters 
the signal. 

Even so, it is reasonable to expect EEG signals to be non-random because 


TThe procedures for micro-scopic recordings involve very different considerations and are 
not discussed in this book. In any case these are no longer referred to as EEG, since EEG 
always measures ensembles of neurons. 
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FIGURE 1.8: International 10-20 system for placement of scalp EEG elec- 
trodes. In (a) the positions and standard naming of the electrodes are shown. 
The advantage of this placement system is that the locations of the electrodes 
are computed as percentages of standard distances. In this way records are 
comparable between patients. Figures (b) and (c) show two standard ways in 
which the measured signals are referenced to each other. These referencing 
systems highlight relative activity between the front and back of the head (in 
(b)) or between the left and right hemispheres (in (c)). 


neurons in functionally related areas, particularly in the cerebral cortex, gen- 
erate activity that is similar to other nearby neurons. Sub-cortical networks 
are assumed to have little effect on scalp EEG recordings [118]. An EEG is 
useful in reflecting the global dynamics of electrical activity of large popula- 
tions of neurons, that is why it is so useful in the diagnosis of epilepsy. A 
more in-depth analysis of EEG measurement for cortical and scalp records 
is presented in Chapter 2. Following is an overview of what ‘typical’ EEG 
records look like. 


1.2.1 The Normal EEG 


There is both art and science involved in the interpretation of EEG. No global 
definition of what an EEG looks like (or should look like) exists. Changes are 
evident at different stages of human life, different levels of awareness (sleep, 
awake) and different modes of behavior (eyes open, eyes closed). Electroen- 
cephalography defies standardization even within these states and hence it is 
difficult to train an expert (human or computer). 

Even with the extensive documentation available the lack of a ‘normal’ 


Introduction 23 


Relative Voltages 
A 


Time (seconds) 
(a) Alpha rhythm 


Eee e AN P7 Ainara AAN 


Relative Voltages 


Time (seconds) 


(b) Early sleep stage 


Nn T T T T T T T T T 
> rae: Rel n ante pl pono S 
o 
> NET o 
B f Ae 
Es ee ea nl AST ae 
E Nam SINN RT URS TAS ANC 

0 1 23 3 4 8 $6 1 8 9 1 

Time (seconds) 
(c) Deep sleep stage 
Chewing Lead Noise Eye Blinks 
Sp a a 
a E: E: 
Rz Rz 4 
Ss s S 
o o D 
> > > 
a a semaine per Mt "m 
e 0 2 4 e 0 2 4 e 0 2 4 
Time (seconds) Time (seconds) Time (seconds) 


(d) Sample artifact 


FIGURE 1.9: Example of ‘normal’ EEG traces. The voltage magnitude in 
each channel is shown relative to each other. (a-c) highlight the differences 
between different states of alertness. In (a) an example of an awake alpha 
rhythm shows the 10Hz activity present in only some (posterior) channels. In 
contrast to the slow waveforms of sleep stages in (b) and (c), the awake EEG 
shows a lot more variability between channels, demonstrating the more global 
nature of sleep versus awake states. Sample artifact are shown in (d). 
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EEG means that the most an electroencephalographer can do is to visually 
recognize general patterns that exist consistently in the majority of the pop- 
ulation. It is important to remember that the absence of such patterns does 
not necessarily imply abnormality. This is why the EEG alone is not sufficient 
for diagnosis of epilepsy, and other forms of observation are necessary. 


A variety of representative EEG segments can be found in Figure 1.9, 
each showing at least four electrodes so as to demonstrate the global nature 
of the EEG measurement. Typical rhythms or patterns have been identified 
in the ‘normal’ EEG to install some order in an otherwise seemingly random 
environment. The most common of these, the alpha rhythm in (a), is an 8- 
13Hz waveform occurring during wakefulness over the posterior regions of the 
head. It is best seen with eyes closed and is present in most healthy adults, 
and often interpreted as an indication of mental maturation and health. But 
even within this commonly observed phenomenon there are large variabilities 
in voltages, spread and quality. Perfectly healthy adults have been found with 
no demonstrable alpha rhythm [118]. The alpha rhythm is a global phenomena 
because it requires large networks to exist, but there is debate as to whether it 
originates due to resonances in the cortico-cortical or thalamo-cortical loops. 


'The different stages of sleep produce an EEG that is very different than 
the awake EEG. Examples are shown in Figure 1.9(b) and (c). The waveforms 
are slower and more global in nature, that is, the channels resemble each other 
more, although there are cases in which fast activity and sudden spikes are 
observed during sleep. These are normal phenomena but interfere with the 
identification of the different stages. 


Artifacts are noise on the EEG, particularly strong in scalp recordings, 
caused by activity that does not originate in the brain but that cannot be 
removed at the time of recording. These can be external — e.g., ambient 
electromagnetic interference (50Hz for Australia, 60Hz in USA), improper 
electrode-scalp junction — or physiological — e.g., eye blinks, chewing, muscle 
movement of scalp musculature (EMG), and, less commonly, heart beats [118]. 
Examples of some such phenomena can be found in Figure 1.9(d). Although an 
electroencephalographer can be quite easily trained to recognize and categorize 
these artifacts, their removal presents a challenge in digital analysis because 
it is difficult to separate the artifact without also affecting the measurement 
of true neural activity. Appropriate techniques to separate true neurological 
activity from such interference are the subject of much research but are not 
yet sophisticated enough to be clinically applicable (e.g., [150]). 


1.2.2 The Epileptic EEG 


Again the variability between epilepsies means that there is no single ‘epileptic’ 
EEG. For example, inter-seizure periods can be as short as a few seconds or 
as long as years. The general trends that occur during seizure have been 
identified as 
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(b) Sample epileptiform waveforms 


FIGURE 1.10: Sample epileptiform EEG. In (a) is an example of a com- 
plete seizure of approximately 110 seconds duration, with its start and end 
as marked. The magnitude of the EEG during a seizure is much larger than 
that preceding it. Notice that the seizure evolves over time, with changes in 
morphology as well as fundamental frequency. Notice also the artifact that 
occurs at about 73 seconds. This is an example of electrodes becoming tem- 
porarily disconnected (probably due to convulsive movements). In (b) are 
some sample waveforms to demonstrate the different ways in which a seizure 
can manifest. These are examples only and not representative of all the pos- 
sibilities in the different epilepsies. In fact the above are cases in which the 
seizures generalize to all channels and are very easy to distinguish from ‘nor- 
mal' EEG. Other seizures may involve only a subset of channels and be very 
difficult to differentiate from background EEG. 
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e Synchronization: The EEG channels behave more like each other be- 
cause the neural activity is similar across a larger area of cortex. This 
can occur at small scales and at large scales, that is, synchronization 
can be observed in a few or in many EEG channels. 


e Large amplitude: After seizure onset the EEG may become much larger 
in amplitude than prior to a seizure. This can be because the cortex 
becomes hyper-excitable (more neurons are active) or simply a conse- 
quence of the increased synchronization between channels. 


e Oscillations: Each channel often becomes more oscillatory, in contrast 
to the examples of normal EEG shown in Figure 1.9. The oscillations are 
typically associated with larger scales — that is, once epileptic activity 
has spread sufficiently — because interacting regions of the brain are 
necessary to sustain oscillations. 


These trends are representative only, and often recognized only because 
of a sharp change from background activity. A seizure does not necessarily 
display all, or in fact any, of these features. It is difficult to come up with a 
single characteristic that is common to all types of epilepsies. For example, 
although many seizures involve large amplitude oscillations there are others 
whose amplitude are no larger than previous activity; channels become syn- 
chronized during seizures but not all channels are necessarily involved in an 
episode; seizure onset and offset are usually abrupt but some are capable of 
appearing more slowly; after the seizure there may or may not be a period of 
‘silent’ EEG, where there is noticeably decreased activity and during which 
time the patient is in a state much like deep sleep. This variation in observ- 
ables occurs between patients, between seizures in the same patient and within 
a single seizure. 

Figure 1.10 shows sample EEG of ‘typical’ seizures. Of most importance 
is the evolution that occurs, shown in (a), where the seizure changes in funda- 
mental frequency as well as morphology over time. In (b) are some examples 
of different seizure EEG waveforms. In all cases the activity in each channel is 
more oscillatory than normal, although the shape of the oscillations vary, and 
the seizure has generalized to involve all channels. Notice also how similar the 
seizure EEG in the leftmost panel is to the chewing artifact shown in Figure 
1.9(d). This highlights the difficulties with differentiation between what is 
‘normal’ and what is ‘epileptic’. 

Inter-seizure epileptiform discharges, known as spikes, can sometimes be 
observed. Characteristically these are short bursts of high amplitude, synchro- 
nized and multi-phasic activity (in which a change in polarity occurs several 
times) that manifest themselves at or around the epileptic focus and stand out 
from the background EEG. Some believe that full blown seizures may simply 
be prolonged versions of these spikes [118, 158], although this is contentious. 

Given the large variability in observations the correct classification of the 
epileptic EEG is a difficult if not impossible task. Classification of the epileptic 
EEG is discussed next. 
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1.2.3 Detecting Changes in the EEG 


The reason that EEG is so useful in the study of epilepsy is that it can 
quantitatively show the changes of brain activity over time. Because seizures 
represent a rare phenomenon, detecting these changes is important for the 
diagnosis and treatment of the disorder. Reliable automated detection of 
changes that lead up to a seizure is the major theme in this book. 

The purpose of the detection of changes varies. It may be used solely 
to distinguish between seizure and inter-seizure EEG, useful for diagnosis. 
Alternatively it may be predictive in that it informs when a seizure is imminent 
before it happens — detection of the pre-seizure state. The ability to predict 
leads to the possibility to manage the disorder better, and it allows a complete 
re-evaluation of pharmacological treatment [91]. Certainly the social benefits 
are indisputable — it is the apparent randomness of seizures that make epilepsy 
so debilitating [96]. 

More than twenty years of research have demonstrated that selection of 
features that best distinguish between inter-seizure, pre-seizure and seizure 
EEG is a difficult task. Most likely this is because time fragments of EEG 
cannot in general be simply labeled either ‘normal’ or ‘epileptic’ [193], that is, 
the decision space is much more complex and these two ‘categories’ may not 
be separable in general. The complexity may be gleaned from the observation 
that a 1,200 page book edited by Ernst Niedermeyer and Fernando Lopes da 
Silva ([118]) is often used as the handbook by electroencephalographers, and 
is solely dedicated to the description of EEG. 

The difficulties have not stopped the hunt for the optimal features. Epilepsy 
is, after all, the most common recurrent neurological condition in the world. 
The features themselves are extracted from the EEG through signal process- 
ing, a process by which a signal, in this case the EEG, is transformed to a 
quantitative form that is more compact than the raw data so that it can be 
understood more easily. To detect the onset of seizures it is logical to target 
quantitatively what human experts target qualitatively. Human experts base 
their decisions on information such as 


1. Spatial and temporal information: A seizure is usually reported when its 
duration is long (short epileptic bursts, e.g., 1 second, may sometimes 
be classified as spikes) and not local. Oscillations in a single channel are 
unlikely to be epileptic in nature because epileptic events have a field 
which nearly always involves nearby electrodes. 


2. Background state: A clear and well defined period of different EEG pat- 
tern to the background activity is sought when classifying the epileptic 
EEG. Of most importance is the difference between the awake and asleep 
EEG, the latter being slower and more global in nature. 


3. Expert-knowledge rejection: Bursts that comply to criteria associated 
with (1) and (2) may still be artifactual in nature. At this stage, the 
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expert uses accumulated knowledge, gathered through experience, to 
distinguish between an epileptic event, common artifact or rhythms. 


'The guidelines that must be used to predict seizures are less clear because 
the human eye is often incapable of detecting the changes that lead to a seizure. 
Current predictors perform poorly and are only tested on very specific cases 
(e.g., [71], [113]). The question ‘can seizures be predicted?’ has not yet been 
adequately answered in literature. 

In the identification of the seizure or pre-seizure state the ability to quan- 
tify rules that differentiate them from the inter-seizure EEG is imperative 
to the performance of the detector. Current detectors are data driven, also 
known as black box models, and do not infer any knowledge beyond that which 
is presented by the EEG data. A review of the features commonly used and 
how they perform in the context of seizure detection is given in Chapter 3 and 
Chapter 5. 

An alternative strategy is to develop detectors based on an understanding 
of how the brain behaves. This involves translating the physiology (Section 
1.1) and its measurements (Section 1.2) into a mathematical dynamic model. 
Unlike black box methods, dynamic models can be used to infer beyond the 
data that is available, and may be useful when solving difficult problems such 
as prediction of seizures. Important concepts in the construction of these 
physiologically based dynamic models are introduced next. 


1.3 Dynamics of the Brain 


'The dynamics of the brain describe how activity in this system evolves over 
time. In engineering a model is typically a set of mathematical equations that 
explain this behavior. In neuroscience it is typical to use the word to refer 
to animal models — a condition in an animal, often induced artificially, that 
is similar to that of a human. Both types of models are designed to describe 
and understand activity found in the brain, and both can simulate the be- 
havior they are trying to replicate. But whilst animal models are most often 
used to understand the mechanisms of a specific pathology, mathematical dy- 
namic models are in theory capable of describing both normal and pathological 
behavior. More importantly, a mathematical model allows computation of ex- 
pected system behavior, a much cheaper and more reproducible alternative 
to animal models! For example, a suitable model of brain dynamics can be 
used to simulate the expected activity within the brain. In conjunction with 
information about the properties of the head it can then be used to compute 
the expected EEG waveforms at the scalp. In some cases computation allows 
prediction of future behavior. 

The data found through an animal model are helpful in the development 
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of its mathematical counterpart. In particular, animal models have proven 
very useful to help understand the mechanisms of epilepsy — they can be 
used to generate epileptic data reproducibly which for ethical reasons would 
be impossible to obtain from human experiments. However, a mathematical 
model of brain activity is not the same as a mathematical model of epilepsy: 
one is a description of how the normal brain behaves and the latter is a 
description of a pathology. In theory a model of normal activity should also be 
capable of describing epilepsy if the parameters of that model — the numbers 
that describe the physiology of the brain — are changed accordingly. The 
animal models provide information as to what the relevant parameters are 
and how their behavior deviates from the ‘normal’ brain. 


The process of constructing a mathematical model can be di- 
vided into three steps: 


1. Creating a mathematical dynamic model structure: By using knowledge 
of the mechanisms of the brain the set of mathematical equations that 
best describe this system is written down. Identification of the relevant 
physiology depends on the scale of interest, which for epilepsy should 
be at least larger than micro-scopic. The model structure contains a 
number of the parameters that should be tuned to select a particular 
model from the collection of models that are represented by the model 
structure. 


2. Gathering experimental data: Both animal models as well as routine 
medical measurements such as the EEG give information about the be- 
havior of the brain. For epilepsy such data involves periods of ‘normal’ 
as well as pathological activity. 


3. Validating the model: Using the experimental data gathered in Step 
2, the dynamic model must be validated through simulations to see if 
activity of this type, both normal and abnormal, is reproducible when 
relevant changes are made to the parameters. The dynamic model may 
equally be in-validated, in which case further detail or changes in as- 
sumptions are needed in Step 1. 


The type of data that will be used to validate the model is important 
when considering Step 1. For example, if the data reflect the electrical ac- 
tivity of the brain then it cannot be used to validate a model that describes 
the metabolic processes. In-depth knowledge of the physical as well as the 
measurement processes involved in generating the validation data is neces- 
sary so that the models can reflect this activity (see Figure 1.2). Since EEG 
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is the most common available data that describes changes over time, it is the 
global electrical activity of the brain that is most often used to generate global 
dynamic models. It is important to understand both the generating system, 
the brain, as well as the measurement system itself in order to understand 
the EEG signal. A detailed explanation of EEG measurement is presented in 
Chapter 2. 

Simplifying assumptions are necessary wherever possible because the brain 
involves in the order of 100 billion neurons. This is not because greater com- 
plexity makes the model worse (although it is possible that too much com- 
plexity can deteriorate performance!), but because current technology cannot 
cope with such high computational demand. The simplifying assumptions 
make the problem simpler at the expense of detail, justifiable only when such 
detail is not essential. For example, if it is known that the activity within one 
column is roughly uniform, then modeling can shift from single neurons to sin- 
gle columns. This is only valid at the macro-scopic scales, suitable for EEG 
measurements, because small differences within a column would be averaged 
out by a large-scale measurement. Micro-scopic recordings, on the other hand, 
would be affected even by these small differences. Other relevant assumptions 
for macro-scopic modeling are listed in Chapter 6. 

Once the scale of interest and corresponding assumptions are made, then 
the appropriate free parameters that remain in the model must be identified. 
For an arbitrary network this involves the identification of structural compo- 
nents such as neuron population and network topology, as well as the dynamic 
parameters including the inputs, outputs, internal signals and delays in the 
system. 


1.3.1 Micro- and Macro-Scopic Models 


In a model of the micro-scopic system, say one neuron, the most important 
process in its dynamics is the integration of incoming PSPs and the conse- 
quent firing of action potentials. The activity at the synapses is the input. 
'The action potential is the output. Internal dynamics must account for the 
integration of the various PSPs at the soma, the propagation mechanism of 
PSPs from their synapse to the soma as well as the generation (or not) of 
the action potential itself. The dynamics of this system are governed by the 
strength of the inputs, the structure of the neuron as well as delays in activity. 
In a neuron delays are caused by propagation times from synapse-to-soma and 
soma-to-synapse, which differ depending on which part of dendritic structure 
connections are made. Delays also occur in the time it takes for inputs and 
outputs to turn ‘on’ and ‘off’, corresponding to how synaptic gates open and 
close. Opening and closing times are not necessarily the same, as seen in 
Figure 1.5(b) — a PSP is not symmetric (in time). 

At larger scales it is not the single neuron that is important but the average 
activity of the ensemble. If our system is a cortical column in which most 
neurons are assumed to work similarly, the inputs to this system are incoming 
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signals from cortico-cortical and thalamo-cortical connections. The model 
output depends on what is being simulated. If it is a measurement such as 
the EEG that is being modeled, then the output is the electrical activity 
projected through to the measurement system. If instead it is the neural 
interactions that are important, then the outputs are simply the projections 
of activity to unseen cortical and subcortical regions. Propagation delays 
from the micro-scopic scale are relatively unimportant here, but the capacitive 
effects caused by gating mechanisms can shape the electrical activity of the 
EEG. The network topology, that is, the number of neurons and their average 
connections, is relevant to the generation of the internal signaling in this meso- 
Scopic system. 

Larger models can incorporate networks in which many cortical columns 
or subcortical systems interact with one another. Because such sub-systems 
are approximately independent from one another [120], each can be modeled 
separately and a larger system constructed by inter-connecting the smaller 
ones together. In this case the inputs and the outputs remain the same 
(cortico-cortical and thalamo-cortical projections), but additional delays cor- 
responding to the time that a signal takes to travel between sub-systems are 
important in the generation of rhythms observed in the EEG. An example of 
a combination of sub-systems can be found in Figure 1.7. 

In the last three paragraphs it is apparent that some of the dynamics at 
the micro-scopic scale can shape activity at the meso- and macro-scopic scales. 
The converse is also true — macro-scopic activity can affect the dynamics at 
the micro-scopic scale because average electrical activity over a large ensemble 
dictates how a single neuron reacts to an input. This bi-directional relation- 
ship between scales is depicted in Figure 1.11. 


A model of a complete system must account for how smaller 
scales affect the dynamics of larger scales, and, in turn, how the 
activity at these larger scales affect the behavior at the smaller 
scales. 


In the creation of the macro-scopic EEG model it is not always obvious 
what physiological processes transcend scales. Experimental evidence shows 
that of most importance are the capacitive effects and strengths of PSPs occur- 
ring at micro-scopic scales. Changes in these can affect the EEG frequencies of 
ensemble systems [42]. Other micro-scopic activity such as in-neuron propaga- 
tion delays and dendritic structure are often ignored because they seemingly 
have little impact on EEG global dynamics [73]. Exactly which processes are 
thought important are described in further detail in Chapter 6, but recall the 
relationship between spatial and temporal scales states that EEG activity at 
macro-scopic scales is slower than micro-scopic. A rule of thumb is that dy- 
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FIGURE 1.11: Representation of the intra- and inter-scale interactions in 
the development of a model. Within each chosen scale there is interaction 
between sub-systems of the same scale. Activity can also transcend scales, as 
shown, because the small scale dynamics can affect the larger scale dynamics. 
In turn, large-scale behavior can affect smaller-scale behavior. Appropriate 
assumptions must be made in the process of developing a model so that, once 
the scale of interest is determined, these interactions are not ignored. The 
identification of the parameters that can transcend scales is a key challenge. 


namic processes at smaller scales affect the EEG at larger scales only if the 
activity is comparable in timescale. Synaptic signals are slow and overlap at 
slow timescales, thus are incorporated, whereas action potentials that are fast 
are omitted (see Figure 1.5). 


1.3.2 Dynamic Models of Epilepsy 


Epilepsy is often treated as a special case easier to model than an entire brain 
because it is a state in which a large population of neurons behave similarly, 
thus simplifying the problem. However many of the mechanisms of epilepsy 
arise from ‘normal’ processes and even a healthy brain is capable of seizing. 
Thus an understanding of epilepsy requires an understanding of the ‘normal’ 
brain. 

In this book the approach is to focus on a single model that can describe 
the electrical activity of a ‘normal’ brain and in which suitable changes are 
representative of the many types of epilepsies. These changes can differ sig- 
nificantly depending on the pathology but should be supported by evidence 
found in animal models and be reflected by the appropriate parameters in the 
mathematical model. For example, in some types of focal epilepsy the focus 
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contains abnormally large pyramidal neurons [12]. A model of macro-scopic 
electrical activity may not contain parameters that describe the size of a typ- 
ical neuron. This information must be captured by other parameters that 
describe how these neurons behave differently from healthy neurons rather 
than how differently they look. 

In Section 1.1.4 two key questions were outlined in the understanding of 
epilepsy — initiation and generalization. A useful model should at a mini- 
mum be capable of replicating each of these, in the hope that understanding 
the parameter changes that lead to a seizure may shed light on the physical 
processes themselves. Two approaches are possible: 


1. Focus on initiation: If normal inputs are capable of driving the brain 
into seizure then what are the model parameters responsible for this? 
Animal models are important in pinpointing relevant abnormal physi- 
ology and processes in epilepsy. The mathematical model can be vali- 
dated if they are able to replicate observed experimental results. Con- 
versely, because the mathematical models are based on physiology they 
can themselves be used to validate the mechanisms of epilepsy obtained 
from the animal models. 


2. Focus on spread and generalization: This approach is only suitable for 
focal epilepsies in which the origin of epileptic activity is known. Non- 
focal epilepsies spread so fast that research is limited to the study of 
initiation of a seizure. Here we assume that the focus is behaving badly, 
in that hyper-synchronous and hyper-active behavior feeds into nearby 
cortex and subcortex. This output of the abnormal focus becomes the 
input to other normal regions of the brain. How does this result in the 
spread of activity? 


'The two approaches, illustrated in Figure 1.12, are not mutually exclusive 
but treating them as such can simplify the problem. If these models are to 
be used for the treatment of epilepsy the question of which approach to take 
is then whether the therapy is administered to prevent the initiation or the 
spread of epileptic activity. The former requires prediction of a seizure, whilst 
the latter focuses on detection, as discussed in Section 1.2. 


1.4 Stochasticity in Neural Systems 


The word random, often used interchangeably with stochastic, must not be 
confused with the colloquial understanding that ‘nothing can be said about a 
random event'. A stochastic process can be described to a certainty dictated 
by the expected distribution of the process itself. Think for example of flipping 
a coin. Assuming no bias, each toss is an instance of the process with 50% 
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FIGURE 1.12: Two questions that must be answered in a mathematical model 
of focal epilepsy — initiation and generalization. In (a) the method is to con- 
centrate on how the epileptic focus begins to behave incorrectly, that is, how 
the seizure is initiated. (b) looks at how this activity spreads once the focus 
is out of control. In both pictures above the gray cortical column represents 
the epileptic focus. The difference between the two methods lies in the flow 
of information. Approach (a) looks at how inputs to the abnormal neurons 
fail to control runaway activity, whereas (b) looks at how connections that 
project out of the focus entrain the remainder of the brain. 


chance of being heads and 5096 tails. For each toss we cannot say what the 
outcome will be until it actually happens. However on average, and given a 
sufficient number of trials, we expect half the tosses to be heads and half tails, 
as dictated by the probability distribution of this experiment. 


A stochastic or random event cannot be predicted unambigu- 
ously — its value is only certain once it has occurred. However 
if the event space (the set of all possible values the event may 
take) and its probability distribution are known then it is ex- 
pected that the average or ensemble behavior is as described by 
this distribution. If this distribution does not change over time 
this is known as a stationary process. 


In the brain events are not disconnected from each other like the successive 
tosses of a coin. If the EEG is measured at 100mV in one sample then it is 
very unlikely that the next sample will read —100mV. Information about the 
past limits the possibilities of the future. If possible future events are limited 
to a sufficiently small range then it is a prediction of what is likely to happen. 

In this book we describe many systems with stochastic elements: 
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1. Brain sources (see Section 1.1) : Even a deterministic brain has be- 
havior dependent on stimuli that drive the brain (e.g., sensory informa- 
tion) which cannot be predicted and are thus best modeled as stochas- 
tic. Within the brain itself the behavior of the various cells, gates and 
chemicals is not known exactly because experiments provide information 
averaged over many trials. Thus the behavior of the brain can be said 
to be stochastic, either because the experiments themselves introduce 
stochasticity to the system or because the source is truly a stochastic 
process. 


2. Measurements (see Section 1.2): Determinism in principle means that 
there is zero uncertainty left. For a measurable system this is an illusion 
— measurements can only be made to a limited precision and thus there 
is never certainty, only shades of uncertainty. In this context stochas- 
ticity is the only way of describing reality and determinism becomes a 
mathematical abstraction. 


3. Dynamic models (see Section 1.3): Meso- and macro-scopic models 
assume uniform behavior of cells: given an equivalent input two pyra- 
midal cells react in exactly the same way. This is not true because slight 
differences in the sizes of pyramidal neurons or chemical concentrations 
may result in differences in the output. Models of ensembles of neu- 
rons where the detail of each individual neuron is lost can be said to 
be true only on average, thus they are stochastic representations of the 
real system. Despite this ‘variability’, modeled as stochasticity, higher 
brain function is reliable and repeatable. The brain can thus cope with, 
or perhaps even rely on, a certain level of random behavior. 


1.5 Conclusions and Further Reading 


'This chapter presented rather briefly the physiology of the brain, and served 
primarily to introduce the central topic of the book: How epilepsy may be 
perceived from an EEG measurement. To probe further in the fascinating 
world of brain structure and to learn more about the chemo-kinetic processes 
underpinning brain behavior one may refer to excellent textbooks such as [80]. 

The primary questions that will be addressed in the remainder of this book 
relate to how the EEG (in its various forms) relates to epilepsy: 


e How well can we label epochs of EEG as epileptic or not? 
e Can we explain how epileptic behavior generalizes and evolves over time? 


e Can we predict transitions into epileptic episodes? 
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'These questions do not have simple solutions, and are discussed in Chap- 
ters 5-7 in the context of epileptic seizure detection, mathematical models 
of epileptic activity, and the predictability of epileptic seizures respectively. 
However some theory must first be covered. We begin by presenting a more 
thorough explanation of the measurement system and its limitations in Chap- 
ter 2, and follow this by how we can make sense of the measurements using 
signal analysis and classification tools in Chapters 3-4. 


2 


EEG Generation and Measurement 


“Since every piece of matter in the Universe is in some way affected 
by every other piece of matter in the Universe, it is in theory possi- 
ble to extrapolate the whole of creation — every sun, every planet, 
their orbits, their composition and their economic and social his- 
tory from, say, one small piece of fairy cake." 


- ‘The Restaurant at the End of the Universe’, Douglas Adams 
(1952-2001) 


A blatant misrepresentation of the truth? Yes. Can we tell everything 
about the universe from a piece of fairy cake? Of course not. What can we tell 
about the universe from a piece of fairy cake? That depends on what is being 
measured. This tongue-in-cheek oversimplification of a complex phenomenon 
(true to form, the Douglas Adams way) may have been written to make us 
laugh, but does pose an important question — what can observations tell us 
about a system? The electroencephalogram, EEG for short, provides us with 
measurements of the temporal distribution of electrical activity in different 
parts of the head, generated by potentials in the hundred billion or so neurons 
in the brain. How can these measurements be used to tell us about the 
behavior of these neurons? 

The aim of this chapter is to address the ‘simpler’ question: what can 
the EEG recordings tell us about the underlying brain activity? Or just as 
importantly what cannot it tell us? 

To address these the neurophysiologist will reference the anatomy and neu- 
ral structure in the brain, the physicist will begin by deriving a set of partial 
differential equations to describe charge distributions and their interactions, 
while the electrical engineer’s instinct will be to draw equivalent circuits and 
derive linear transfer functions that describe the human head. None of these 
are wrong, all are relevant and valid methods that must be considered to de- 
termine how the measurement of EEG is affected by its unique environment. 
But in order to obtain a complete picture the relationship between physiology, 
charge distribution and electrical properties of materials must be understood 
to determine what the EEG, our proverbial piece of fairy cake, is capable of 
telling us about the underlying brain activity. 

An important distinction that is made throughout is between EEG dynam- 
ics and EEG measurement. The former refers to how charges, currents, and 
activation patterns are generated by inbuilt mechanisms of the brain. This in- 
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FIGURE 2.1: The aim of a measurement is to capture a real signal. However 
the precision of measurement affects how closely the measured signal (solid 
line) approximates the real signal (dashed line). The more bits or information 
used the closer the measurement approximates the real signal, as shown in 
(a), (b) and (c). The differences between the measured and the real signal are 
errors that are often modeled as stochastic or random variables. 


cludes the gate-triggered synaptic activity, the subsequent initiation of action 
potentials and the network organization that maintains activity in the brain. 
On the other hand a measurement is a record of this activity in an attempt to 
quantify the charge distribution, currents and activation patterns described 
by the dynamics. The relationship between measurement and dynamics is de- 
pendent on the scale that is being measured as well as how the measurement 
is performed. 

In any case a measurement is limited by the quality of the recording equip- 
ment. This includes its sensitivity imposed by physical constraints (e.g., how 
sensitive an electrode is) as well as how the data is represented. For example if 
the measured signal can itself only take two values then one bit of information 
(that is, a number that can take two values) can be used to store a perfect rep- 
resentation of the measured sample. However if one bit is used to represent 
a signal that can take four values then information is lost in the measure- 
ment process. Increasing the number of bits reduces the measurement error, 
as shown in Figure 2.1, but only down to the sensitivity of the equipment. 
Measurement error is unavoidable. If we have well calibrated equipment so 
that no consistent errors are made, measurement error can be modeled as a 
stochastic process. 

EEG measurements at the scalp, cortex or elsewhere in the head are also 
affected by the electrical properties of the surrounding medium, known as the 
volume conductor. This is because the many biological tissues each alter the 
electric fields generated within the brain differently. The active dynamics of 
the brain can be treated separately from the volume conductor because at 
EEG frequencies its effects are predominantly passive, as is explained in more 
detail later. However understanding how the volume conductor changes the 
measurement is important for understanding the EEG, and is the central focus 
of discussion throughout this chapter. 
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'The existence of electrical activity in the brain was first discovered in the 
late 1800s largely due to Richard Caton's experiments on rabbits and mon- 
keys, although it was not until 1924 that the German physicist Hans Berger 
reported the first EEG for the human brain. Other forms of bioelectric events 
have been recognized for much longer — as early as 4000BC Egyptian hiero- 
glyphs were found depicting a catfish capable of generating electric pulses 
— but were for a long time thought to be the result of ‘animal spirits’ and 
not linked to electrical activity. The word electricity only came about in 1600 
when William Gilbert named it so after the Greek word for amber (€\eKT pov). 
Gilbert devised the first instrument to measure the attractive force generated 
by the static electricity in the stone (Figure 2.3(a)). Since then Johan Salemo 
Christopf Schwigger's development of a galvanometer in 1821, Faraday's in- 
duction coil in 1831, Maxwell's equations of electromagnetism in 1861 and a 
myriad of other discoveries have led to the development of our understanding, 
and subsequently enabled the first human EEG recordings by Hans Berger 
(Figure 2.3(b)). The early traces bear little resemblance to today's EEG, 
but the ability to record the electrical activity of the brain has revolutionized 
clinical diagnosis and treatment of neurological conditions, and in particu- 
lar epilepsy. A much more detailed history of electricity and its relevance to 
bioelectromagnetics and EEG can be found in [103] and [118]. 


Other quantitative descriptions of brain activity exist and are in clinical 
use. The magnetoencephalogram or MEG is a relatively recent development 
that tracks the magnetic activity generated by the same current mechanisms 
that generate the electrical activity measured by the EEG. Both EEG and 
MEG have very good temporal but poor spatial resolution. The magnetic 
fields of the MEG are examined in [122] and [103], and are not discussed 
in further detail here. Better spatial resolution but zero temporal resolution 
is possible with Magnetic Resonance Imaging (MRI) which estimates topo- 
graphic maps of the structure of the brain. Functional MRI (fMRI) provides 
limited spatial and temporal resolution of the metabolic rather than electric 
processes in the brain. Co-registration of EEG and fMRI may be useful and 
has been studied in [121]. MRI may also be used to provide geometric in- 
formation for EEG analysis and is in the process of being integrated as a 
complement to EEG. 


There are in the order of 101° — 10" interconnected neurons in the human 
cortex [168]. The organization of these neurons is such that the analysis of 
the electrical activity generated ranges significantly depending on the scale of 
interest. Up to 100 neurons grouped together form a mini-column and produce 
micro-scopic fields. A cubic millimeter of cortical tissue, a macro-column! , 
may contain 1000s of interacting mini-columns that produce meso-scopic fields 
(see Figure 2.2). Scalp and intra-cranial EEG both measure the macro-scopic 
fields generated by averaging the behavior of many of these macro-columns. 


lOther definitions of a cortical column exist, sometimes with as few as 100 mini-columns. 
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CORTICAL MACRO-COLUMN 
Volume: -1mm? / Neurons: ~10°/ Synapses: -1 0° 


Diameter 
| ~0.16mm | 
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MINI-COLUMNS 
Diameter: ~0.03mm / Neurons: ~100 / Synapses: ~1 09 


FIGURE 2.2: A pictorial representation of the different scales found in the 
cortex. There are in the order of 100,000 macro-columns in the human cortex, 
each occupying roughly a 1mm? volume of the cortex, as shown. Each macro- 
column has in the order of 10,000 neurons and about 10? synapses. A macro- 
column can be sub-divided into roughly 1000 mini-columns, each with ~ 100 
neurons and ~ 10 synapses. 


By convention, the potentials recorded at the micro-scopic and meso-scopic 
level are denoted by $, whilst macro-scopic EEG scales are represented by 4. 

Macro-scopic potentials measured intra-cranially are predominantly de- 
pendent on the location and activity of mesosources. Macro-scopic potentials 
measured at the scalp are also dependent on source location and activity, but 
must consider the volume conducting properties of the tissues between source 
and recording site. The behavior of the head volume conductor is determined 
by the geometry and electrical properties of the brain, cerebrospinal fluid 
(CSF), skull and scalp. Much larger areas of cortical activity are involved 
at the scalp EEG because of these volume conducting effects. Calculation of 
the potentials generated at any scale, given source locations and activity, is 
known as the forward problem. The inverse problem, on the other hand, is the 
determination of source location and activity level given a measurement, and 
is a much more difficult task because there are many more degrees of freedom. 

'The majority of this chapter is dedicated to exploring the forward problem 
given the volume conducting properties of the human head. This then deter- 
mines what the EEG measures so that its limitations, and what it is capable 
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FIGURE 2.3: (a) The electroscope, an instrument designed by William Gilbert 
(1600) to detect static electricity. This image is reproduced from [182] origi- 
nally published in 1902. (b) An early record of the human EEG, taken from 
Hans Berger's notes circa 1924. Both images have lapsed into public domain. 


of telling us about the sources in the brain, may be understood. The elec- 
tric fields and electric potentials generated in different parts of the head are 
explored. We consider typical source distributions found in those materials 
(Section 2.2), the properties of the materials in the human head (Section 2.3), 
and the fields generated when typical sources exist in this medium (Section 
2.4). This chapter for the most part ignores the effects of the recording equip- 
ment and the noise or artifact present in EEG records. The dynamics of EEG 
behavior are introduced also, but details of its mechanisms are discussed in 
greater detail in Chapter 6. 

'The reader that is more interested in analysis of epileptic EEG rather than 
details of signal generation and measurement may skip through the majority 
of this chapter, although they may still find it worthwhile to browse through 
the boxed comments throughout the text that outline important observations. 

Before moving on the specifics of the head as a volume conductor a succinct 
introduction to the electric field generation and measurement is provided. 
Even the electrical engineer or physicist that is familiar with this material 
may find it instructive to read this section because considerations relevant to 
3-dimensional biological systems, which greatly differ from the conventional 
1-dimensional analysis of electrical circuits, are outlined. 
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2.1 Principles of Bioelectric Phenomena 


'This story, like any other in electromagnetism, begins with single charges. 
Positive and negative charges are the fundamental units in electromagnetics 
— it is their position, movement and interaction that determine the fields and 
potentials in any medium. Although special considerations apply in biolog- 
ical systems because of the geometry, charge distributions and properties of 
the materials involved, bioelectromagnetism can be explained using the same 
principles as those used to explain behavior of a circuit in electrical engineer- 
ing. 

Moving charges produce electric fields as well as magnetic fields that inter- 
act with one another at high frequencies but are approximately independent 
at low frequencies. Because electrophysiological applications (including the 
EEG) involve the low frequency spectrum biological systems can be treated 
as quasi-static and the fields produced can be approximated by the equations 
in electrostatics. These are mathematically simpler because they ignore the 
direct contributions of magnetic fields. Electrostatics also implies that at any 
time the generated field is a result of a static distribution of charges and the 
explicit dependence of time on any source or consequent field can be removed. 
The time dependence in the EEG thus results from the dynamics of the system 
which evolve too slowly to affect the electrical measurement system. 


This section, and the remainder of this chapter, gives an overview of the 
equations of relevance in EEG measurement of biological systems, that is, the 
bioelectric properties of the human head. Magnetic fields are largely ignored — 
to completely cover the field of bioelectromagnetic phenomena one would have 
to study the full set of Maxwell’s equations that uniquely define interactions of 
charges. EEG can be understood quite well without this level of detail, but the 
interested reader can find excellent and complete explanations of traditional 
electromagnetism as well as bioelectromagnetism in texts such as [29], [76], 
[109] and [122]. 

After some preliminary words on notation, the fields generated from meso- 
scopic sources formed by grouping single charges together, denoted ¢, are 
looked at first. This is followed by solutions to how groups of meso-scopic 
fields summate to generate a macro-scopic field in a homogeneous volume 
conductor. Finally a brief outline of how ® is modified when the volume 
conductor is not homogeneous is given. 


2.1.1 A Foreword on Notation 
In this book all quantities are presented in a 3-dimensional Cartesian co- 


ordinate system with values relative to an origin, shown in Figure 2.4 with 
axes labels x, y and z. Two types of quantities are referenced here: scalars and 
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Length a = |al| 


a=aa 
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FIGURE 2.4: Vectors are quantities that have direction and magnitude whilst 
scalars only have magnitude. Notationally, vectors are differentiated from 
scalars by bold non-italic font. In this diagram a and b are example vectors 
defined in terms of an origin in 3-dimensional space. c = a — b is a vector 
defined in terms of other vectors so that this origin can be arbitrarily located. 
Vectors can also be re-defined in terms of a unit vector, which denotes direc- 
tion, and a scalar magnitude. For example a = a à, where a = |lal| is the 
scalar length of a, and à is a vector of unit length in the same direction as a. 


vectors. A scalar gives only magnitude?, whilst a vector has both magnitude 
and direction. For example, think of a broken down car that must be moved 
off the road. The magnitude of the force exerted to push it is a scalar quantity. 
However the direction of the pushing is also important because the car must be 
moved in a particular way. The direction of the force (e.g., forward) combined 
with its magnitude is a vector. 

In this text vector quantities are differentiated from scalars by a bold non- 
italic font notation. For example, in Figure 2.4, a is a line that points from 
the origin to a point in 3-D space. Because a is directional it is a vector. Its 
length a = ||a||, where || - || denotes magnitude, is a scalar quantity because 
it is independent of the direction of a. Vectors can also be defined in terms 
of each other, e.g., c — a — b as shown, so that the origin in the co-ordinate 
system can be made arbitrary. 

A vector (e.g., a in Figure 2.4) of unit length, denoted à, can be used 
together with a scalar to describe any vector in the direction of a. It can be 
re-written as à = a à where a is the scalar magnitude. Unit vectors i, j and 
k are commonly used to describe directions along the x, y and z axes, also 
shown. 


2 Allowing for negative and zero values also. 
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2.1.2 From Single Charges to Equivalent Dipoles 


A single (and now static) charge? q suspended at location rs in a medium with 
properties that are uniform in space (homogeneous) and direction (isotropic) 
produces an electric field, E(rg), that exerts a force on other charged particles. 
E(rn) is a vector quantity measured in force per unit charge (Newtons per 
Coulomb) pointing in the direction of this force location rg. It is given by 


q " 


E — o 
(rr) 4neolrg — rs]? 


(2.1) 

where ||rg — rs|| is the distance from source location rg to recording loca- 
tion rg. f is the unit vector pointing from the charge source to the measure- 
ment location which indicates that the electric field is also in this direction. 
€o is the permittivity of the medium, measured in Farads per meter (F/m, 
described later). Because it describes a single charge Equation 2.1 represents 
as a monopole electric field. A summary of the SI units of each quantity and 
their relationships can be found in Appendix 2.A. 

A quantity related to electric field is the electric potential ó(rg), also 
known as voltage and measured in Volts. Electric potential can be conceptu- 
ally related to hydraulic pressure: if a pressure difference exists between two 
points in a connected water pipe then the water flows from a point of high 
pressure to one of low pressure. How fast the water flows and thus its po- 
tential ability to do work depends on the difference in pressure. The electric 
potential (difference) between two points in space is equal to the amount of 
work required to move a unit of charge between these two points. To be well 
defined the amount of work has to be independent of the path taken. The 
unit is work per charge, or J/C (Joules per Coulomb, see Appendix 2.A). 

For charges, this ‘pressure’ is a consequence of the forces that exist in the 
form of electric fields defined by Equation 2.1. The potential ability to move 
charges and create electric currents, i.e., the electric potential, is determined 
by this field. 

(rr) depends on recording location but is itself a scalar quantity because 
it is independent of the path. It is related to E(rg) at low frequencies as 

do; , 9b; , 00; 

E(r) = —vó(r) = ($i dy? | ek) ; 

where V is an operator that calculates the gradient of a signal in the 

direction of maximum change. The electric field at a particular location can 

thus be interpreted to point in the direction of steepest descent of the electric 

potential, with magnitude equivalent to the steepness of this descent. Because 

of this relationship the electric field in Equation 2.1 has units of Volts per 
meter. 


(2.2) 


3Charges can have both positive or negative quantity. By convention electrons have 
negative charge and protons positive. Ions can have either. Charges with the same polarity 
repel and charges with opposite polarity attract. 
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'The electric potential is not essential to explain electromagnetic phenom- 
ena and its definition makes sense only under the quasi-static assumption that 
magnetic fields can be ignored at low frequencies. However electric potential 
is the quantity that EEG measures and must be used to explain these records. 
The remainder of this text uses ‘fields’ to describe both electric and potential 
fields, related to each other as above. 

The electric potential produced by a monopole located at rg can be com- 
puted from Equations 2.1 and 2.2 as 


9 
AregR’ 


with R = |/rr — rs|| the distance between the source and the point of 
interest*. 

Now consider a collection of N charges in space, each generating potentials 
$1(rn)...ów (rn). Since the electric field equations are linear the total field can 
be calculated by a linear combination of the individual charges — this is known 
as the principle of superposition. Thus, 


ó(rn) = (2.3) 


N 


N 
(tr) = Y  ós(rg) = : y X (2.4) 


4T7€o 


n=1 
When there are only 2 charges (N = 2, as in Figure 2.5(a)) with opposite 


polarities then it can be shown that at location rg sufficiently distant from 
the sources 


q cos(0) 
dreo R ’ 


where 0 is the angle between the direction of the charges and rg — rs, as 
illustrated in Figure 2.5(a). Here sufficiently distant refers to a distance R 
large compared to the separation of the two charges d — that is R >> d. The 
above configuration of charges is known as an electric dipole, and it is impor- 
tant for bioelectric applications for reasons explained later. T'he potential is 
strongest in the direction of the two charges, and weakest perpendicular to it. 

Equation 2.5 also shows that (at large distances) the potential generated 
by an electric dipole decreases with distance as 1/R?, whereas a monopole 
in Equation 2.3 decays slower as 1/ R. This is because there is a significant 
cancellation effect when charges of equal magnitude but opposite polarity are 
placed close to one another. Similar cancellation effects occur with config- 
urations involving more than two charges. A quadrupole (containing 4 unit 
charges with zero nett charge) has a field that decays as 1/ R?, an octupole (8 
charges) as 1/ R^, etc. 


Qdipole(YR) = O1(¥R) + é»(rn) © (2.5) 


^Tt may be useful at this stage to look at the co-ordinate system in Figure 2.5(a) and 
(b). These figures are not for a single charge but the vectors involved are the same. 
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FIGURE 2.5: Representative dipoles — (a) electrical, (b) current and (c) equiv- 
alent electrical or current dipole P(rg), under the assumption of small vol- 
umes measured at large distances from rs, with equal number of positive and 
negative monopoles within. 


Since all configurations of charges may be expressed as the 
sum of monopoles, dipoles, quadrupoles, octupoles and an in- 
finite number of higher order terms, the electric potential for 
N charges at a large distance can be expressed as the summa- 
tion over all these contributions. Furthermore, because terms 
of higher order than a dipole decay very fast with distance, the 
field at large distance may be approximated by the monopole and 
dipole contributions alone. 


Under the assumption that the volume is electrically neu- 
tral (i.e., no nett charge) then the potential does not contain a 
monopole term, and at a distance away from this volume the 
potential is well approximated by the dipole term alone. The 
latter is called the equivalent dipole (see Figure 2.5(c)). 


The equivalent dipole is an approximation to the potential ó(rg) generated 
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by a volume with many charges when R is much larger than the diameter of 
the volume in question. The unit volume covered by an equivalent dipole is 
known as a vogel. The arrow in Figure 2.5(c) indicates the direction of the 
nett field generated by the volume. 

An equivalent dipole is important in calculations of electric and magnetic 
fields because it allows the simplification of a very complex system involving 
many billions of charges to one involving a finite number of elements. A voxel 
may contain a large number of charges, but as a whole it can be represented 
as an equivalent dipole with direction and magnitude representative of the 
configuration of the charges within the volume. Again this is valid only at 
large distances relative to volume size. 

The problem with using equivalent electric dipoles in biological systems is 
that the assumption that positive and negative charges balance each other may 
not be valid. Monopole contributions cannot necessarily be ignored. Instead, 
it is more practical to consider current dipoles, discussed next. 


2.1.3 Equivalent Current Dipoles 


A current dipole is defined in terms of current sources and sinks rather than 
positive and negative charges, as shown in Figure 2.5(b). By convention, a 
current source is a positive monopole, whereas a sink is a negative monopole. 

Mathematically current monopoles are analogous to Equation 2.3. The 
principle of superposition implies that many sources and sinks contribute to 
the overall electric potential linearly 


N 1 N 
ó(rn) =X rlr oe 


n=1 


(2.6) 


ER 


where the current In determines the strength and polarity of each current 
monopole flowing into a medium with conductivity c, measured in Siemens 
per meter (S/m, described later). 

The combination of one positive and negative current monopole yields a 
current dipole; the voltage now decreases in strength as 1/ R? at large distances 


Id cos(0) 
4nc R? ` 


The strength is proportional to the separation d. 

Figure 2.5(c) is true also for a volume containing multiple current dipoles 
with approximately equal number of current sinks and sources. Unit volumes 
can be described by an equivalent current dipole, where the strength and 
orientation is determined by the distribution of the monopoles within. For a 
current dipole the arrow in this figure indicates the nett direction of current 
flow in the volume. It is equivalent to the electric dipole because this current 
occurs in the direction of the nett field formed by the charges. 

At this point electrical engineers will wonder ‘How can a current source 


Ọdipole(T R) RÍ (2.7) 
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or sink exist? Currents require closed loops and charge must always be con- 
served.’ This is true, and also in this case charge does not magically appear 
or disappear. However the properties of the biology allow approximations by 
which current sources and sinks can be used. These approximations are dis- 
cussed in Section 2.2.1. For now, regardless of the feasibility of current dipoles, 
the important result is that current dipoles are mathematically equivalent to 
charge dipoles from the voltage point of view. 


2.1.4 Macro-Scopic Mean Fields — Homogeneous Media 


Macro-scopic potentials $(rg) at low frequencies, recorded at location rg and 
generated by many voxels, each represented by an equivalent current dipole 
P(rg), are calculated by integrating the contribution of all sources contained 
within the volume conductor. 


(rp) - 35 M ` P(rs).G(rs,rr). (2.8) 


volume 


G(rs,rg) is known as the Green's function of the volume conductor. It 
contains all the information about the electrical properties of the material 
and its geometry. Each equivalent dipole P located at rg is projected (by 
the dot product, denoted .) to recording location rg through the volume 
conductor. The effects of the volume conductor on P between rg and rg are 
given by G(rs,rg). The total potential (rR) is the linear superposition of 
the projection of all dipoles P onto location rg. 

For an ideal case where all sources are current dipoles and there is only one 
homogeneous material in the volume conductor, the geometrical dependence 
of Green's function is only on distance and direction. 


Fg —rg 


G NI i M 
(rs,rR) troller — rsl? 


(2.9) 


The electrical properties of the material (c) are also part of the above 
equation, but play a minor role in the case of homogeneity. 


For non-magnetic materials two quantities can unambiguously 
describe its electrical properties 


1. Permittivity e(f) describes the ability of a material to store charge. 
Permittivity (or dielectric constant) is measured in Farads per meter 
(F/m). 

Permittivity is a property of the material that does not depend on its 
geometry, whereas capacitance, measured in Farads, does. 


2. Conductivity o(f) is a measure of the material's ability to conduct 
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electric charge, that is, how easy it is for charge to move through it. 
Higher conductivities allow charges to flow more freely. It is measured 
in Siemens (S= 1/2) per meter. Its inverse is called resistivity v(f) and 
may be used instead. 


Conductivity and resistivity are properties of the material that do not 


depend on its geometry, whereas conductance and resistance will. 


When the material is not homogeneous and isotropic its ge- 
ometry also plays a role. 


It is shown in Section 2.3 that for EEG frequencies capacitive effects can 
be ignored, and conductivity is independent of frequency as implied in Equa- 
tion 2.9. Conductivity o is the most important quantity in determining the 
potentials P(rr) at EEG frequencies. 


2.1.5 Macro-Scopic Mean Fields — Inhomogeneous Media 


When a volume is not homogeneous, that is, the properties of the material 
are not uniform throughout, the differences in material properties become im- 
portant in estimating (rg). Potentials between two boundaries of materials 
with different conductivities (om1 and om2) are expected to change, but must 
comply to the following set of boundary conditions 


09,54 = OD m2 
Om1 dn O m2 an (2.10) 
09,4 09,5 
= 2.11 
Ot Ot ' ( ) 


where n refers to directions normal to the surface and t to components 
tangential to the surface. 

Equation 2.10 says that the change of the potential in the direction normal 
to the boundary between two media is inversely proportional to the conductiv- 
ity in that medium. It is a consequence of conservation of charge — it describes 
the necessity for current density to be continuous across the surface. Current 
cannot appear and disappear at the boundary. 

Equation 2.11 says the change of the potential in a direction tangential to 
the boundary between two media is the same on either side of this boundary. 
It is a consequence of conservation of work — it dictates that potentials must 
be continuous across a boundary. To see how the above equations are derived, 
refer to Appendix 2.B. 

The geometry of the materials is also important. A volume conductor 
taken as a whole can be inhomogeneous and possibly anisotropic even when 
each region is approximated as both homogeneous and isotropic. Equation 2.9 
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must be adapted to incorporate all this information. As a consequence, the 
corresponding Green's function may become complex and numerical methods 
must be employed to compute Equation 2.8. 


2.2 Current Sources in Biological Tissue 


This section focuses on the generators of electrical activity within the brain. It 
is accepted that the charge (or current) distribution in the brain is responsible 
for generating the EEG waveforms. No sources of interest exist in CSF, skull 
or scalp. In fact, it is the purpose of the EEG to measure activity in the 
brain. Activity arising from any other source is responsible for the artifacts, 
for example muscle from the scalp, eye blinks, as well as others described in 
Chapter 1. 

This section explores the type of activity (charge and current distributions 
within the brain) that is expected to affect EEG the most. 

The human brain is composed of excitable neural tissue interacting with 
itself. The fundamental unit of the brain is the neuron, which integrates 
positive and negative inputs from dendritic currents produced by incoming 
action potentials generated by other neurons. When a threshold of activ- 
ity is reached the neuron in turn generates its own action potential, thereby 
continuing the transmission of information. The resulting macro-scopic EEG 
recordings are predominantly influenced by the micro-scopic synaptic struc- 
ture (Section 2.2.1), meso-scopic and micro-scopic cortical structure (Section 
2.2.2) as well as the temporal distribution of activity (Section 2.2.3). 


2.2.1 Synaptic Structure and Current Dipoles 


In the EEG the charges responsible for generating electric fields are those 
contained in the cortical layers of the brain (as explained in detail in Section 
2.2). Ideally the distribution of all charges contained within the cortex should 
be used to estimate these fields. This is a computationally intractable problem 
that is made simpler by approximating small discrete volumes or vogels as one 
equivalent electric dipole, as discussed in Section 2.1.2. The total potential at 
the EEG recording site is then estimated by the principle of superposition. 

However representing each voxel as an equivalent electric dipole requires 
knowledge of the charge distribution within it. No such knowledge exists. 
Furthermore, because charge concentrations vary across the cortex the as- 
sumptions that allow the division into discrete voxels approximated as a single 
electric dipole do not hold. 

Instead electrophysiologists like to use a current dipole to describe each 
voxel. As discussed in Section 2.1.3 real current dipoles do not exist because 
any current always requires a return path — it never appears or disappears 
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FIGURE 2.6: An example of the current loops formed by a single neuron. 
Current (in this case positive ions) flows into and out of the cell only at ion 
gates and synapses. This trans-membrane flow creates charge differentials: a 
net negative charge at the current sinks, denoted —, and a net positive charge 
at the current sources, denoted +. Extra-cellular currents are caused by this 
difference in charge distribution. In the cortex these extra-cellular currents 
are normal to the surface of the cortex because the neurons are aligned in this 
direction. The return intra-cellular currents can be ignored for the purposes 
of describing the EEG because of the highly resistive cell membrane. Because 
ion gates are very small relative to the size of the neuron the flow of ions into 
and out of the cell can be approximated as current sources and current sinks. 
'This approximation allows the use of current dipoles instead of electric dipoles 
to model sources within the cortex. 
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from nowhere. However when estimating the EEG certain physiology in the 
brain can be approximated as current dipoles. 

To see this, refer to Figure 2.6 where an example closed current loop 
generated by a single neuron is shown. Positive and negative charges (ions) 
in the brain can flow in the intra-cellular or extra-cellular space. Very little 
current (on average) flows through the membrane of neurons because they are 
highly impermeable to ions under normal conditions. At certain points (e.g., 
synapses and ion gates) the neuron membranes can adapt their permeability 
to specific ions in response to incoming stimuli, resulting in a sudden influx 
or efflux of current into the neuron cell. These cause a charge differential 
that generates extra-cellular currents. Inside the neuron the intra-cellular 
currents form a return path so that charge is conserved. Both intra-cellular 
and extra-cellular currents are shown in this figure. 

Because the membrane is highly resistive, and as a direct result of the 
boundary conditions listed in Section 2.1.5, the potentials generated by the 
intra-cellular currents can be ignored at macro-scopic scales. As far as the 
EEG measurement is concerned these currents do not exist — it is only the 
extra-cellular currents that contribute to the EEG. Furthermore, because the 
physical size of synapses and gates is very small (relative to the size of the 
neuron), the flow of current across the membrane can be approximated as a 
current source or a sink. 

This approximation allows current dipoles to be used as a model for the 
brain generators, helped along by the availability of a significant volume of 
knowledge on the distribution of gates in the neocortex. Equivalent informa- 
tion about charge distributions is not as readily interpretable. Furthermore, 
the structure of the cortex is such that discrete voxels can be defined so long 
as these are large relative to the size of a neuron. Within a voxel there is 
an equal number of current sources and sinks because each current source or 
sink is compensated by current sources and sinks distributed on other regions 
of the dendritic and somatic structure. Thus monopole contributions can be 
ignored and using a single current dipole P(rg) to describe a voxel is valid. 


At macro-scopic scales the sudden flow of positive and nega- 
tive ions across the cell membrane, caused by gating mecha- 
nisms that include EPSPs, IPSPs, voltage-triggered gates and 
ion pumps, can be approximated as current sources or sinks 
respectively. This is possible because 


e These gates are small relative to the cell body, and 


e Intra-cellular return currents do not affect the macro-scopic EEG. 


Because statistical information of the distribution of these 
gates exists current rather than electric dipoles are a better choice 
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to model EEG generation — current sources and sinks exist nat- 
urally in the brain. 


Only extra-cellular currents affect the EEG. A single dipole 
P (rs) oriented in the direction of this current flow can be used 
to describe a discrete volume or vozel of cortex because an equal 
number of current sources and sinks exist within it. 


A suitable scale for a voxel depends on the composition of the cortex. Each 
cortical neuron has between 10? — 10^ synapses. A minicolumn is composed 
of about 100 pyramidal cells and about 10° synapses and covers a ^ 0.03mm 
diameter of cortex (see Figure 2.2). In turn each mm? (macro-column) of cor- 
tex contains in the order of 10? neurons and 10? synapses [168]. To effectively 
model both intra-cranial and scalp EEG, whilst maintaining computational 
requirements at a reasonable level, a scale in between that of a minicolumn and 
a macrocolumn has been suggested in [121]. Larger scales may be effectively 
used to model scalp EEG with only small errors [114]. 


2.2.2 Spatial Integration 


In order to solve the forward problem the location and orientation of these 
dipoles is important. In this section it is shown that significant simplifications 
can be made based on the organization of the brain. 

Imagine a volume conductor in which N equivalent dipole sources with 
unit magnitude are present. Each source is in the form of P(rg) and the 
total potential recorded at a sufficiently distant location rg in a homogeneous 
medium is given by Equation 2.8. If all N sources are oriented in a random 
manner then the total expected recorded potential is, on average, equal to 
zero, because the potentials formed by the dipole sources destructively inter- 
fere. The energy of the recorded potential for each trial fluctuates randomly 
but is proportional to VN, the standard deviation of the distribution. A sim- 
ulation to verify this is presented in Figure 2.7(a). The normalized voltage is 
computed for 1000 trials, each with N = 50,000 randomly oriented sources, 
and the overall probability distribution of resultant voltages is shown. On av- 
erage, the expected voltage has a mean of 0 and a standard deviation (energy) 
proportional to VN ~ 220. 

In contrast, if all N sources are aligned the resulting field is proportional 
to N because of constructive interference. Figure 2.7(b) simulates the number 
of N parallel sources required to generate an equivalent field to M randomly 
aligned sources. With N — 200 fields in the same order as M — 40,000 can be 
created, or equivalently, less than 196 of the field is due to the M — 40,000. 
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Aligned sources produce a voltage that is much stronger than 
an equal number of randomly oriented ones. The potentials 
generated by N aligned sources become undetectable only when 
there are >> N? randomly aligned ones. This relationship is 
shown in Figure 2.7(b). 


Since EEG recorded voltages are not randomly distributed (patterns are 
observed) and non-zero the dipole orientations must not be random. Histolog- 
ical classification of different regions of the brain has revealed that although 
sources in deeper structures are approximately randomly oriented, the pyra- 
midal neurons in the cortex have their dendritic structure aligned. The effects 
of this are discussed next. 


2.2.2.1 Cortical Structure 


Approximately 8596 of neurons in the cortex are pyramidal cells that are 
aligned in columnar structures normal to the cortical surface [121, 168]. The 
current sinks and sources in this configuration occur at different levels along 
this structure. One such configuration, where all current sinks are on the 
dendrites and all current sources at the cell body, is shown in Figure 2.6. 

Regardless of the distribution of current sources and sinks, the current flow 
is vertical because the sinks and sources produce tiers of charge differentials, 
also shown in Figure 2.6. Vertical current flow is promoted by the alignment 
of neural fibers in this direction. Hence the meso-scopic source approximation 
P(rs) is always fixed in an orientation normal to the cortical surface, as 
originally depicted in Figure 2.5(c). 

The neocortex is sufficiently thin (~ 2mm) so that the voxel represented 
by P (rs) can span the depth of the cortex. As such, and because there are no 
sources below the neocortex that significantly contribute to the EEG (due to 
the random neuron structures as well as their distances from recording sites) 
the cortex (1500 — 3000cm? in its entirety, including fissures) may be modeled 
as a folded sheet rather than a volume of dipoles [114, 122]. The potential 
at an arbitrary recording site rg originally defined as a volume integral in 
Equation 2.8 can instead be simplified to a surface integral 


o(r)= M; OM; P(r9.G(rs ra), (2.12) 


cortical surface 


where in this case the orientation of P(rg) is always perpendicular to the 
cortical surface. This dipole sheet is seen to be the major contributor to scalp 
recorded EEGs. 
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FIGURE 2.7: (a) The probability distribution of voltages generated by 1000 
trials, where in each trial N = 50,000 randomly oriented dipoles in a vol- 
ume conductor are simulated. A source generates a normalized voltage with 
magnitude of 1, but with random orientation. Mean u ~ 0 and energy (stan- 
dard deviation) is proportional to VN œ~ 220. (b) Relationship between the 
number of random and parallel orientated sources that can generate equiva- 
lent fields. Simulations are averaged over 1000 trials. An example is drawn 
where 200 parallel sources generate a field proportional to 40, 000 randomly 
oriented dipoles, that is, less than 196 of the field is due to the 40,000 ran- 
domly oriented sources. Hence parallel sources are most likely the generators 
of EEG. 


A current dipole P(rg) on the cortex has orientation normal 
to the surface because extra-cellular current flow occurs in this 
direction. This alignment makes cortical activity the single-most 
important contributor to EEG measurements. 


A sheet rather than a volume of dipoles can be used to model 
the total potential ®(r) because the cortex is thin. 
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2.2.2.2 Cortical Folds 


A dipole sheet is appropriate to model the cortex in its entirety, although 
special care must be taken with the interpretation of these dipoles. P(rg) 
is normal to the cortical surface, but not necessarily radially oriented in the 
spherical head because the cortex is a folded structure. The crown of a fold is 
called a gyrus and dipoles here are predominantly radial. The valley is called 
a sulcus and dipoles on the walls of the sulci are predominantly tangential. 
Both are shown in Figure 2.8(a), where each arrow is representative of the 
current flow (or the equivalent current dipole) of a small volume of cortical 
tissue. The folds in the cortex are another reason that the diameter of the 
voxel represented by P(rg) must be small — for the orientation of P(rg) to re- 
main constant over this volume then the pyramidal structures must be aligned 
within it. 

The activation of large areas of the cortex simultaneously implies that 
frequently the entire sulcus and gyrus are simultaneously and synchronously 
active. In this case, tangential dipoles in the fissures tend to cancel out. This 
is shown in Figure 2.8(b), where the simulated potential generated by the 
entire structure in (a) varies very little from that generated only by the gyrus. 


When the entire sulcus and gyrus is synchronously active the 
tangential dipoles on the fissures affect recorded potentials very 
little and can be ignored. Only radial contributions are impor- 
tant. 


If only part of the sulcus is active this cancellation effect does not occur 
and the tangential dipoles do contribute to EEG recordings. This is also 
shown in Figure 2.8(b) where simulation is repeated when only the left half of 
the sources in (a) are active. However, it is worth noting that the tangential 
dipoles are generally located deeper within a structure, thereby contributing 
less than a radial dipole of the same magnitude. This is discussed in more 
detail in Section 2.4. 

'The simulations presented in Figure 2.8 concern very regular fissures and 
sulci that are not typical of the irregular foldings observed in a real cortex. 
However, the trends indicate that tangential dipoles do not affect the intra- 
cranial EEG because the electrodes record local activity and are not sensitive 
to relatively distant sulci. Scalp electrodes record larger areas but the con- 
tribution of the sulci remains minimal because of the distances involved (see 
Section 2.4). Thus in modeling the EEG the cortical folding is typically ig- 
nored and the assumption that all dipoles are radial is enforced. 
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FIGURE 2.8: (a) An idealized configuration of current dipole sources present 
in a sulcus, used for simulation. Each arrow represents a compartment or voxel 
on the cortex. Each voxel is assumed to be approximately self-contained in 
that charge is conserved and a current loop such as that shown in Figure 2.6 
is present. Only extra-cellular currents contribute to the EEG. In (b) normal- 
ized potentials are simulated at the different angular locations shown in (a), 
under the assumption that all sources are synchronously active. The poten- 
tial generated when all sources (gyrus and sulcus) are included in simulation 
is very similar to those when only the gyrus is included. This is because of 
the significant cancellation effects that result in minimal contributions from 
tangential dipoles. In this case the contributions of the sulcus can be ignored. 
When only half the sources are active (e.g., left half, simulated in (b)) these 
cancellation effects do not exist and the contributions of sulci cannot be ig- 
nored. However, at scalp locations the distances are large enough so that 
tangential contributions remain relatively small. 
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FIGURE 2.9: (a) Dendritic response used for simulation, with typical rise and 
fall times as described in Chapter 6. (b) Relationship between the number 
of synchronized sources N and asynchronous sources M, averaged over 1000 
trials. This relationship is similar to that observed in Figure 2.7(b), where 
the synchronous activity of few sources generates large responses that can be, 
on average, only equaled by many asynchronous sources. Equal magnitude 
responses occur when M = N?. 


2.2.8 Temporal Integration 


So far all time dependence of potentials has been ignored because of the quasi- 
static approximation that allows the potential field at any one time to be 
described by static sources. However it is necessary to consider how the po- 
tentials generated are a consequence of the temporal arrangement of dipoles. 
This arrangement is a result of the dynamics (discussed in Section 2.4.2 and 
in more detail in Chapter 6) but the resulting measured field is dependent on 
the synchronicity of the dipoles. 

When the mesosources in the cortex are synchronized the neurons in this 
area behave similarly. They are active at the same time and with the same 
phase (that is, the flow of current is in the same direction). The minicolumn, 
comprising of about 100 neurons, has been proposed as the basic functional 
unit of neural circuits. Neurons within a minicolumn are expected to mostly 
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act in synchrony [121]. At the macro-scopic scale, however, recordings are 
affected by very large areas (in the order of 20cm? for scalp recordings, and 
lem? for intra-cranial recordings, see later). Even for intra-cranial recordings 
these areas can contain hundreds of thousands of minicolumns. How does the 
synchronicity between these affect the EEG? 


Figure 2.9 describes this relationship. A typical dendritic tree response to 
incoming action potentials (shown in (a)) with random polarity (positive or 
negative) is distributed uniformly over time, and integrated to generate the 
resulting EEG. Simulations of the equivalent energy of the potentials result- 
ing from N synchronized versus M un-synchronized sources are plotted in (b). 
'This relationship is similar to that observed in Figure 2.7, that is, synchro- 
nized sources contribute potentials proportional to N, whilst un-synchronized 
sources only contribute energy in a random manner, with contributions in the 
order of VM. Hence very few synchronized sources can register large poten- 
tials on the EEG, and many more un-synchronized sources are necessary to 
generate potentials of equivalent magnitude. 


EEG fields are dominated by the amount of synchronized activ- 
ity, not the quantity of activity. 


Given that the majority of neurons within a column are on average in- 
active [121], and that EEG potentials are relatively large, a natural conclusion 
is that large areas of cortex must be synchronized to generate the observed 
EEG waveforms. Fortunately, the brain structure is such that cortico-cortical 
fibers are capable of activating large areas of cortex at any one time. If this 
were not the case, the observed EEG would be meaningless. 


This also explains why action potentials have been ignored in the discus- 
sions relating to the measurement of EEG. For a signal to register on the 
EEG there must be a high level of synchronization between sources. Action 
potentials are short in duration (much higher frequencies) and so they do not 
superimpose as often as do synaptic return currents, which are slower and have 
a much longer response (see Figure 1.5(b)). Action potentials do contribute 
to the EEG, but their contribution is minimal. 


The synaptic return currents are the predominant contributors 
to the global EEG. Action potentials are important for the un- 
derlying dynamics but not to model the EEG [118]. 
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Tissue Conductivity (S/m) ¢/dbrain 
Obrain 0.3 1 
Cesf 1.5 5 

O skull—radial 0.015 1/20 

O skull—tangent 0.075 1/4 

O scalp 0.3 1 


TABLE 2.1: Nominal head tissue conductivities. 


2.3 Volume Conducting Properties of the Head 


Once the sources are known an accurate estimate of the potentials ®(rp) at 
the scalp or at any other location in the human head requires a sufficiently 
accurate description of its volume conducting properties. However not only is 
the geometry of the head complex, the different tissues found in it are both 
inhomogeneous and anisotropic, that is, their electrical properties differ with 
location as well as direction. 

The lack of accurate knowledge of these properties means that certain 
problems, such as solving the inverse problem (localization of sources in the 
brain from scalp measurements), are prone to error. However the purpose of 
this chapter is to obtain a reasonably good qualitative evaluation of solutions 
to the forward problem (determining fields at the scalp generated by sources 
in the brain) and only rough quantitative concordance is necessary. Many 
simplifying assumptions make this problem easier. 

The remainder of this section discusses simplifications and experimental 
evidence that allow a reasonable estimation of the fields at any point within 
the human head, given an arbitrary source distribution. Simplifications based 
on geometry are discussed first, followed by the electrical properties of the 
materials in the head. This is followed by a review of experimentally observed 
ranges of these electrical properties. 


2.3.1 Head Geometry 


The human head may be divided into approximate layers each composed of 
a different type of tissue. The first and most important simplification in the 
calculation of potentials is to approximate the geometry of the head as a set 
of concentric spheres shown in Figure 2.10, where each region represents one 
of the layers of the head. This is a fairly accurate model within a small section 
of the scalp, which is itself roughly spherical, but not for the entire head [103]. 

'The number of concentric spheres used varies depending on the required 
detail, but can include the brain (gray and white matter), cerebrospinal fluid 
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FIGURE 2.10: The 4-sphere approximate model of the human head. 
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(CSF), skull (two compact layers and one spongy layer) and scalp. Simulations 
in this chapter are performed using the 4 sphere layer model, which includes 
the effects of the averaged brain, CSF, averaged skull and scalp. The nominal 
radii chosen for these layers is depicted in Figure 2.10. Implied in this model is 
a fifth layer, air, that has very low conductivity so that the number of charges 
in the head remains constant. It should be re-iterated that these values are 
chosen for representative purposes only — it is well known that the radii of the 
skull vary with location on the head, or that the thickness of the CSF increases 
with age. It is not the accuracy of these measures that is important here, but 
rather their ability to provide qualitative descriptions. Results employing 
these approximations have in any case been shown to be accurate to within 
10-20% when compared to more realistic head models [172, 103, 114]. 

An important consequence of this reduction in complexity is that Equation 
2.12 for typical source distributions can be expressed as the weighted sum of 
basis functions known as spherical harmonics. These are applicable only to 
systems that can be described in a spherical volume and are the space analogy 
to the decomposition of time-dependent signal as the sum of sines and cosines. 
Equation 2.12 can be rewritten as 


(rr) & M5 M7 Pam 9n m (rs Tr R)Ya m (0, €), (2.13) 


n-—0m--—n 


where (0,£) are spherical co-ordinates defined in Figure 2.11(a), n is the 
harmonic number that is a measure of approximate spatial frequency, m is 
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FIGURE 2.11: (a) Spherical co-ordinate system (0, €, r) relative to the Carte- 
sian co-ordinate system (x,y,z). The angles 0 (elevation), € (azimuth) and 
radius r are defined as shown. In (b) and (c) are example Y, ,,, ~ the spher- 
ical harmonic basis functions in Equation 2.13. Black represents a negative 
number, and gray a positive number. Any distribution within a sphere can be 
defined in terms of these basis functions, summed over all possible (n, m). In 
(a) an example is shown in which the harmonic number n remains constant 
and m is varied. m changes the direction of the basis function at harmonic 
number n. In (b) n is varied to show that higher harmonic numbers represent 
higher spatial frequencies or finer spatial scales. 
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the order that represents the different directions of each n and Y,,,, is the 
spherical harmonic basis function associated with (n,m). The coefficients 
Pnm Qn,n(rs,rn) replace P(rs) . G(rs, rn) (the description of source dis- 
tribution and green's function) in Equation 2.12. These quantities are now 
dependent on n and m. 

Each spherical harmonic Y,,, is orthogonal (non-overlapping, indepen- 
dent) to all other harmonics so that Equation 2.13 has zero redundancy. Ex- 
amples for n — 3 and m — 0,1,2 can be seen in Figure 2.11(b). Here the 
approximate scale of components in Y,, ,,, is the same when n is constant, but 
its direction depends on m. Another example in (c) shows that different n 
relate to basis functions of different spatial frequencies. 

The importance of Equation 2.13 is that it is an analytic solution, and is 
tractable numerically. 


2.3.2 Capacitive Effects of Tissue 


This section shows that at macro-scopic scales the capacitive effects of volume 
conduction are minimal. This is not the case for micro-scopic activity, where 
it is the capacitive properties of cellular membranes that allow much of the 
observed dynamic behavior. 

Limited research exists on the frequency dependent permittivity of biolog- 
ical tissues. Conductivity, on the other hand, has been widely studied and 
demonstrated roughly constant at EEG frequencies (0 — 100Hz) [5, 157]. A 
review of the frequency dependence of both conductivity and permittivity is 
presented in [46, 47, 48]. Nominal curves for the permittivity and conduc- 
tivity of gray matter, cortical (compact) bone and cancellous (spongy) bone 
generated with the model presented in this review are shown in Figure 2.12(a) 
and (b). These curves are consistent with values used in Table 2.1. 

If a volume conductor is linear and its electrical properties are uniform 
across space (isotropic), then the total resistance of a path through the con- 
ductor is simply the resistivity (say) multiplied by its length (L). A linear 
volume conductor can be approximated as an electrical circuit consisting of a 
resistor in parallel with a capacitor [103] (see Figure 2.13). For such a system 
it is shown in Appendix 2.C that the capacitive effects can be ignored if 


2n fe(f) «& o(f). (2.14) 


where f is the frequency of interest. The above condition indicates that it 
is permitted to treat the tissue as purely resistive (o only). 

'The ratio Baie) for gray matter, cortical bone and cancellous bone is 
shown to roughly comply with this condition in Figure 2.12(c). This figure 
implies that the capacitive effects are small, and as a consequence there is 
no temporal filtering performed by the properties of the head geometry at the 
frequencies of the EEG. In other words simultaneous recordings beneath and 
above the skull do not experience noticeable delay or deformation, only at- 
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FIGURE 2.12: Theoretical frequency dependent electrical properties of bio- 
logical tissues found in the head, including gray matter, cortical (compact) 
bone and cancellous (spongy) bone. (a) shows the permittivities e( f) relative 
to the permittivity of free space eo, and (b) shows that conductivities o(f) are 
roughly constant for f — 0 — 100Hz. Both (a) and (b) are calculated using 
models in [48] and agree to experimental observed values, including those in 
Table 2.1. (c) shows the ratio described by Equation 2.14, indicating that 
capacitive effects are small and can effectively be ignored in the 0 — 100Hz 
range. 
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FIGURE 2.13: If a volume conductor is linear its electrical properties are 
uniform and both resistance and capacitance are dependent on the path length 
L between any two points. For points A and B shown, total resistance is 


R= a and total capacitance is C = e(f)L. Here s = 2nf7 where f is 


frequency in Hz and j = V —1. This circuit is used to derive the approximate 
condition in Equation 2.14 for which the material's capacitance can be ignored. 
'The derivation is shown in Appendix 2.C. 


tenuation. These observations have been supported by experiments dating as 
far back as 1964 in [31]. 

Given this information one may wonder why it is that frequencies higher 
than about 100Hz are not present in the scalp EEG. If no filtering of the 
intra-cranial activity takes place by the volume conducting head, why are 
frequencies recorded near the cortex higher than the content of scalp EEG? 
The answer is that scalp EEG is filtered, but this is a consequence of the 
underlying synaptic dynamics combined with the spatial filtering properties 
of the head, discussed in Section 2.4. 


2.3.3 Estimating Conductivities 


Estimates of conductivities for regions of inhomogeneous media grouped to- 
gether must always be space averages over large tissue volumes [122]. This 
is because smaller volumes may contain irregularities that cause estimates 
to be unrepresentative of the average properties. The conductivity can be 
estimated in vitro by analyzing dead tissue in an environment that simu- 
lates live conditions, or in vivo. The latter has recently been demonstrated 
as possible, non-invasively, by first determining the head geometry using MRI 
imaging and then using the principle of reciprocity? to estimate conductivities 


5The principle of reciprocity states that if current is injected through two stimulating 
electrodes, the resulting current density or electric field through a volume conductor com- 
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[184, 40, 126]. Similar processes can be used anywhere on the human body, 
for example conductivities of muscles [99]. Estimates of conductivity of head 
tissue are also possible with simultaneous intra-cranial and scalp recordings 
[87]. 

Although conductivities of biological tissue depend on frequency, for the 
EEG this dependency has been shown to be small relative to the variations of 
tissue type [46, 47, 48]. Most estimates thus explicitly ignore this dependency, 
although it is imperative that experiments be performed in the appropriate 
frequency spectrum. 

The conductive properties of each of these layers are summarized in Table 
2.1 and are discussed in more detail below. 


2.3.3.1 Brain 


The brain is composed of two types of tissue: gray matter and white matter. 
Gray matter is where the generation of current sources occurs because it is 
where the pyramidal neurons are located. White matter is a collection of axon 
fibers that connect different regions of the brain together. 

The isotropy of gray and white matter was studied by [167] with the use 
of MRI imaging that reflected the electric properties of different tissues in the 
brain relative to orientation. The authors found that whilst gray matter is 
largely isotropic in nature, white matter is highly anisotropic. This is because 
the conductivity parallel to the orientation of the axon fibers is expected 
to be higher than across the fibers [121]. However, because of the lack of 
consistency in fiber orientation no generalizations can be made about the 
anisotropic properties of the brain. 

In any case the differences in conductivities of gray and white matter 
are relatively small, the latter having a slightly lower conductivity, and the 
difference can be ignored for simulation purposes — the principal effects are 
the result of the relatively highly resistive skull. Conductivity of the average 
brain ranges from Gbrain = 0.12 — 0.48S/m (see [40, 87, 126, 164]). A nominal 
value of Cbrain = 0.38/m is selected for simulations presented here. 


2.3.3.2 CSF 


Cerebrospinal fluid (CSF) like any other body fluid has relatively large con- 
ductivity because it has high concentration of dissolved salts that facilitate 
the transfer of charge [122]. CSF has been shown to vary little between sub- 
jects and can thus be treated as a known parameter [40]. A nominal value of 
OCSF = 50brain = 1.58/m is chosen. 


pletely specifies how the same electrodes used for recording potentials react to dipole sources 
in the volume conductor of the same magnitude, direction and location [103]. This prin- 
ciple can be used to estimate conductivities both in vivo and in vitro, but its usefulness 
for in vivo measurements is that it is non-invasive and effective enough to work with small 
currents that do not harm the subject [40]. 
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2.3.3.3 Skull 


The effects of the skull are the most important to volume conduction because 
it is, relative to all other tissues involved, the one with the lowest conductivity. 
The conductivity of the skull was originally computed by [155] in 1968 using 
a phantom model of a skull immersed in an electrolytic tank. An average 
conductivity ratio of Oskuiu = 1/800brain was deduced and this was for many 
years accepted as the golden ratio of the 3-sphere model of the head that 
ignored the effects of CSF. More recently, this ratio has been disputed, with 
claims that in a 4-sphere model a more appropriate ratio is in the order of 
1/20. This ratio varies from 1/15 ([126]) to 1/25 ([87]). 

Another consideration is that the skull is itself composed of 3 layers — 
two compact layers at the inner and outer skull (cortical bone) and a spongy 
middle layer (cancellous bone) — yet most literature to date treats the skull 
as a homogeneous and isotropic medium. The reality is that (ignoring bone 
malformations and other anomalies) whereas each of the three layers may be 
roughly approximated as isotropic and homogeneous the average skull effect 
is neither. The conductivity of the cancellous bone is roughly 3-6 times higher 
than the compact bone; shunting or re-direction of currents in the tangential 
direction is expected. A better model of average skull conductivity is one in 
which the radial and tangential conductivities differ. Also worth mentioning 
is that the overall skull resistance does not necessarily increase linearly with 
skull thickness. This is because thicker skulls tend to have thicker spongy 
layers, largely absent in thin skulls [122]. 

Poor conductivity of the skull is expected to highly attenuate and spatially 
average sources beneath it. The work in this chapter uses an anisotropic model 
of the skull, with nominal conductivity ratio of 1/20 in the radial direction 
and 1/4 in the tangential direction. 


2.3.3.4 Scalp 


The scalp is composed of soft, fatty tissue with slightly higher conductivity 
than brain tissue, which is much the same. It is treated as isotropic, with 
nominal value of Oscaip = Obrain. Again, small differences between scalp and 
brain conductivities have been ignored because it is the skull (and not these 
small discrepancies) that impacts the effects of the volume conductor most. 


2.4 The EEG: A Macro-Scopic View of the Brain 


Both the type of sources and the volume conducting properties of the head 
have been discussed and we are now in a position to examine how EEG 
waveforms are generated. There is an important distinction that must be 
re-iterated when determining EEG waveform generation: 
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1. Measurement: How does a source configuration (in the form of a dipole 
layer) result in the recorded potentials? This is due to volume conducting 
properties, which has been the focus of most of this chapter. Potential 
distributions due to volume conducting effects are a function of the prop- 
erties of the source type and distribution, the electrical properties of the 
materials involved and the geometry of the volume conductor. For the 
purpose of calculating potentials the current sources are assumed static 
and independent of the dynamics. 


2. Dynamics: | How is the source configuration generated? This is the 
study of the dynamics of the system, that is, how the underlying com- 
ponents of the systems (neurons, minicolumns, macrocolums) interact 
to generate a particular distribution of sources, which will vary over time 
(dynamics). 


The volume conducting effects and the dynamics can be treated indepen- 
dently because the electrostatic equations presented in Section 2.1 operate at 
a much faster time scale than brain dynamics. Dynamic systems representa- 
tive of brain activity are discussed in detail in Chapter 6. Here the features 
of the EEG that are a direct consequence of the dynamics and not the vol- 
ume conducting properties are outlined, with particular interest given to the 
generation of the epileptic EEG. 


2.4.1 EEG Measurement 


In this section the EEG forward problem, given sources in the brain, is sim- 
ulated using the 4-layer concentric spheres model of the head (brain, CSF, 
skull, scalp) with conductivity and radii specified in Table 2.1 and Figure 2.10. 
Resulting potentials recorded at the cortex and at the scalp (corresponding 
to intra- and extra-cranial recordings respectively) are explored. Solutions 
for anatomically more realistic head models may be computed using finite 
or boundary element models (see, for example, [92]). However, more realistic 
models are best used to reduce errors in applications such as the identification 
of intra-cranial sources given scalp recordings (inverse problem), but do not 
change the qualitative observations targeted here. 

The simulations are presented in normalized color-gradient diagrams that 
indicate how a unit dipole at any location in the volume conductor affects 
the recording location rg. For example, in Figure 2.14(a) is a rectangular 
homogeneous medium with sources A-F located as shown. Assuming unit 
magnitude (Id = 1) and vertical orientation of dipoles then Equation 2.7 
indicates that the contributions of A-F to rg is dependent only on distance 
Rand angle 0 from source to rg. Thus dipole A contributes more than dipole 
B, followed by C, D, etc as shown in (b) and color coded in (a). The properties 
of the entire volume conductor can be color coded in this way by estimating 
the relative effects of unit dipoles at any location on the volume conductor, 
as shown in (c). From this diagram it can be concluded that the effects of a 
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FIGURE 2.14: Example computation of the contribution that dipoles at differ- 
ent locations within the volume conductor have on recording location rg. (a) 
shows an example homogeneous volume conductor with sources A-F located 
as shown. In (b) the strength of A-B normalized relative to the strongest con- 
tributor (in this case located directly beneath the electrode) is shown. Dipole 
A contributes to (rr) more than dipole B, followed by C, etc. The relative 
contributions can be color-coded, as in (c) which shows estimates of the rel- 
ative contributions of a dipole in any location within the volume conductor. 
(A log-scale is used to differentiate between colors so that differences are more 
obvious.) Estimates of ®(r)p use Equation 2.7, with Id = 1 and (R,0) defined 
as shown in (a). 
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1.0 


(e) Bipolar, 60 degrees (f) Bipolar, 5 degrees 


FIGURE 2.15: Relative contributions of radial and tangential dipoles to the 
EEG at the cortex (a-b), and at the scalp (c-d). The areas affecting EEG at the 
scalp are much greater than those at the cortex due to the averaging effects of 
the skull. Bipolar referencing effectively doubles this area if the electrodes are 
distant (as in (e)) but significant cancellation effects can improve the spatial 
resolution if electrodes are sufficiently close (as in (f)). This is highlighted in 
Figure 2.16(a). Note that only relative absolute strengths are given in (a-f). 
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(b) Relative strength of directional dipoles 


FIGURE 2.16: The contributing area is defined as the area that provides 50% 
of the response at the electrode. Figure (a) illustrates that electrodes must 
have a significant angular separation before their contributing areas show no 
overlap. (b) illustrates that radial dipoles are relatively more important than 
tangentially oriented dipoles. 


dipole are strongest when 0 = 0°, negligible when 0 = 90°, and decrease with 
distance R. 

For the simulations in the remainder of this section the volume conduc- 
tor is the 4-layer spherical head and the computation is performed using the 
methodology described in [115]. Only radial and tangential dipoles are an- 
alyzed. Even though cortical folding allows dipoles in all directions any one 
dipole can be expressed as the sum of 3 components (one radial and 2 tan- 
gential) in a spherical head. 

By definition voltages are never a result of a single electrode, but rather 
the difference between the recording and reference site. The location of this 
reference is a problematic issue in EEG recordings, and is discussed separately. 
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2.4.1.1 Cortical (Intra-Cranial) Recordings 


Figure 2.15(a) and (b) show the effects of radial and tangential dipoles on 
an intra-cranial recording site rg. Because of symmetry 2-D diagrams are 
presented, although computations were performed in 3-D. Regions have been 
enlarged to the area of interest, as potentials recorded on the cortex further 
than a few mm decrease to zero rapidly. 

From these diagrams it is estimated that the surface area? recorded by a 
point electrode on the cortex is in the order of 4-5mm?. Intra-cranial record- 
ings are very localized, and potentials beyond 2mm contribute less than 10% 
to the total signal. Electrodes placed a few cm apart from each other are 
capable of producing very different waveforms (unless both regions are acting 
similarly due to the dynamics). This has been observed experimentally in, for 
example, [123] where coherence between electrodes further than 2cm showed 
almost zero correlation to one another. 

Tangential sources have minimal contribution to the intra-cranial EEG 
because recording strength falls very fast with respect to distance. In any 
case, electrodes on the cortex can be assumed to be affected only by sources 
that are roughly normal to the surface because of the small areas that are 
involved in recordings. Tangential sources can thus be ignored. 

The potentials calculated here are for ideal point sized electrodes that 
do not exist in practice. In reality the electrode size has a significant effect 
on the area that affects cortical recordings, given that an integration over the 
entire volume of the electrode must take place. Electrode sizes in intra-cranial 
recordings are rarely greater than 1mm in diameter, with newer electrodes in 
the order of 0.01-0.1mm being fabricated. Example electrode dimensions can 
be found in Figure 5.1(b). 


6 


2.4.1.2 Scalp Recordings 


Figure 2.15(c) and (d) show the effects of radial and tangential dipoles on 
a scalp recording site rg. The scalp EEG is affected by much larger areas 
of cortical activity. Let us define the effective cortical area as that in which 
dipoles affect the recording site by at least 50% of the contribution of the 
radial dipole located directly beneath rg ". In this case a measurement is 
affected by all dipoles in an area of ~ 20cm? containing billions of neurons! 

An important observation is that the commonly accepted notion that brain 
activity directly beneath the scalp electrode is the largest contributor to the 
recording is not necessarily true. 


6Surface area is a more appropriate measure than volume due to the previously proposed 
model of a dipole sheet to represent cortical activity. Although radial dipoles of a particular 
depth are shown in Figure 2.15(c) and (d), there are no generators in the white matter 
directly beneath the cortex, and deeper sources are not radially aligned, so their contribution 
is minimal. 

TThis is known to engineers as the -6dB point, at which energy is halved. 
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The same recording can arise from one dipole directly beneath 
the electrode, one stronger dipole located some distance R from 
the electrode or many weaker ones at distant locations. The 
increased sensitivity of scalp electrodes to radial dipoles directly 
beneath them does not mean that large potentials arise from 
sources that are located there. 


This averaging over larger surfaces takes place because of the relatively 
low conductive skull which tends to spread currents tangentially to the sur- 
face. Skull conductivities (as well as the inclusion of anisotropy) affect the 
calculated figures but do not qualitatively affect results. A skull with lower 
conductivity tends to spread potentials more, and the area in question in- 
creases. 

Radial dipoles affect the scalp EEG more than tangential dipoles, with 
its largest contribution arising directly beneath the electrode. A tangential 
dipole of the same magnitude contributes only 1/3—1/2 of an equivalent radial 
dipole (as pictured in Figure 2.15(h)) [122], with its maximum occurring at an 
angle dependent on the properties of the volume conductor. As a consequence 
of this and the fact that tangential dipoles are generally located deeper in the 
cortex, the effects of the tangential dipoles are relatively small and largely 
ignored in models of head volume conductors. The level of attenuation by the 
skull to each of these dipoles is not shown in these diagrams; only relative 
values are given. 

Spatial averaging also implies that, for reasonable values, electrode size on 
scalp recorded EEG does not significantly affect these observations. 


Scalp EEG measures average over cortical areas in the order of 
20cm?, whereas intra-cranial records are responsive to areas in 
the order of 4mm?. Radial dipoles are the major contributors 
to both — tangential sources are largely ignored because of their 
more distant location as well as their smaller contributions. 


2.4.1.3 The Search for an Ideal Reference 


Discussions so far have ignored the contributions of a referencing electrode 
even though all recorded potentials depict differences between two points. 
That is 


V (recorded) = (ra) - (rp), (2.15) 


74 Epileptic Seizures and the EEG 


where rp is called the reference electrode site and ry is the site of in- 
terest. In an ideal situation ®(rg) has no impact on V (rrecorded) so that 
V (Lrecorded) F2 (ra). This can be achieved provided rg is in an electrically 
neutral location (®(rg) = 0) that is at a large electrical distance from r4, in 
which case referencing issues become unimportant. Electrically distant in this 
case refers to a location sufficiently far away so that volumes contributing to 
(ra) do not affect (rp). 

In reality rg also needs to be somewhere where current can flow between 
ra and rg because of conditions imposed by the recording equipment. This 
means that when recording the EEG rg cannot be, say, on a distant wall — 
it must lie on the subject's body. This makes it impossible to achieve electro- 
neutrality in the reference because the distances involved for the scalp are not 
large enough. The same is not true for intra-cranial recordings because the 
volumes that affect each electrode are much smaller. It is well accepted that 
no quiet reference exists for scalp EEG, so it is important to understand the 
merits of different configurations, some of which are listed below. 


e Linked ears/mastoids: Performed by physically or mathematically link- 
ing the subjects ears is historically the most well known yet fundamen- 
tally flawed method of referencing. Referencing to the linked mastoids 
introduces a third site whose activity can influence the EEG. Recordings 
of this nature are rarely un-biased. 


e Average reference: Removes the average of all recorded electrodes from 
recording site. For sufficiently fine electrode sampling this is appropriate 
because the large number of electrodes can accurately estimate the field 
distribution of the entire head, and thus remove it from the recording 
site of interest. However for a relatively low spatial sampling (such as 
the typical standard 10-20 configuration system in Figure 1.8(a)) the 
subtraction of the average introduces a bias due to imperfect estimation 
of the field. Average referencing should be used for high-resolution EEG 
only?. 


e Bipolar referencing: Refers to the potential difference between two 
sites on the head? generally used for electrodes close to one another, 
in specific configurations so as to emphasize voltage differences between 
hemispheres of brain activity (see Figure 1.8(b) and (c)) [118]. Figure 
2.15(e) and (f) demonstrate that bipolar recordings can both increase 
and decrease the cortical surface area affecting recordings significantly. 


8Scales at this point might be getting confusing: If the effective cortical area of an 
electrode is ~ 20cm?, is this not adequate sampling? No - 20cm? relates to roughly a 
4-5cm circular diameter. The circumference of the head is in the order of 30cm, which 
means the electrodes in the 10-20 configuration are at best about 6-7cm apart. This is not 
adequate sampling for a correct estimate of average activity. 

9Note that all recordings are bipolar in that they measure the difference in potentials be- 
tween two sites. In EEG bipolar referencing refers to the electroencephalographer's explicit 
acknowledgment of this fact. 


EEG Generation and Measurement 75 


For electrodes far away as in (e) the cortical area affecting the EEG 
doubles, but as electrodes come closer as in (f) the effective area of mea- 
surement decreases dramatically because of cancellation effects between 
recording sites, as described in (g). Bipolar recordings closer than ~2cm 
on the scalp do not improve the resolution further [31], also shown in (g). 
Bipolar recordings still involve large areas of cortical activity, but can- 
cellation effects reduce contributions from the first spherical harmonics, 
effectively localizing the resulting potential. For the scalp EEG with few 
channels bipolar recordings are the best way to improve spatial resolution. 


In practice all EEG electrodes are referenced to a single electrode at the 
time of acquisition, typically located on top of the head or behind the neck. 
Re-referencing is performed digitally post-acquisition. The effects of measure- 
ment noise must be considered when re-referencing takes place. 

A measurement can only be recorded to the accuracy of the measurement 
equipment. Some level of noise is expected, say 

V(ra) = V(ra) +0; (2.16) 

where V (r 4) is the real signal and 4 is the noise introduced by the record- 

ing equipment. This noise is expected to have a particular distribution de- 

termined by the specifications of the equipment as well as the measurement 
environment. 

Recorded signals can be re-referenced, in which case 

V(rag) = (V(ra) - V(ra)) + (na — nB). (2.17) 

If 74 and ng have similar distributions but there is no overlap between 
them the noise of V(r4g) can as much as double. In practice some overlap 
does exist because both signals V(rA) and V(rg) are recorded referenced to 
the same electrode. A typical EEG measurement system limits its noise to 
< 2uV, thus a maximum of 44V for bipolar re-referencing when signals are in 
the order of 1mV is acceptable. 

For a more detailed discussion on the effects of referencing in EEG, refer 
to [122]. 


2.4.1.4 Spatial Filtering Properties of the Skull 


It is evident from Figure 2.15 that spatial averaging of sources occurs to a 
large degree on scalp recorded EEG. A way to explain this is by examining 
how the spherical harmonics are affected by the volume conductor. It was 
mentioned in Section 2.3.1 that the resulting potential at the scalp due to 
volume conducting effects can be expressed as a sum of these spherical har- 
monics or basis functions, as in Equation 2.13. In the spatial domain, this 
is analogous to a temporal signal being represented as the sum of sinusoids. 
Examples of these basis functions are described in Figure 2.11. 

When the distribution of source activity is broken down into spherical 


76 Epileptic Seizures and the EEG 


Oo 

= 

RE 0.8 - 

5 

4 06r Monopolar Reference] | 
e Bipolar Reference 
o 04r 

> 

5 

5 

Q 

ea 


0.2 
0 Il HN m 
5 10 15 20 25 
Spatial Scale (n) 


FIGURE 2.17: Spatial filter transfer function for scalp EEG for a bipolar 
reference, with electrodes far apart (in white), and a bipolar reference with 
electrodes close together (in gray). The latter improves spatial resolution by 
providing more attenuation to larger spatial scales and less to smaller ones. 


harmonics, those events that are synchronous over large areas can in general 
be described with fewer terms in Equation 2.13. If many smaller scale events 
exist then more terms are necessary because finer spatial resolution is required 
to describe these events. More specifically, the harmonic number n is related 
to spatial frequency as 
ia. (2.18) 
27T scalp 

where n is the harmonic number, k is the spatial frequency and fscalp is 
the radius of the sphere. This can be verified in Figure 2.11(c) where higher n 
produce basis functions that describe the spatial domain at higher resolutions 
— spatial distances are inversely related to spatial frequencies. In a sphere 
with radius rscaip = 9.2cm, as is the head, n = 1,2,3,4 represents spatial 
scales of approximately 58, 29,19, 14cm respectively. 

A spatial transfer function can be derived, that, analogous to how a trans- 
fer function in the frequency domain describes the attenuation of each sinu- 
soid, in the spatial domain corresponds to how events occurring at the spatial 
scales of each spherical harmonic are attenuated. Note that a spatial transfer 
function does not tell us how much of the signal is a result of a particular 
spatial scale. Instead it says that if there exists an event that covers an area 
of a particular spatial scale then it is attenuated according to the relationship 
shown by the transfer function. 

The spatial transfer functions for bipolar referencing related to Figure 
2.15(e) and (f) are shown in Figure 2.17. These were calculated as described 
in [172], and can be interpreted as follows: 


e Bipolar references with distant electrodes (denoted monopolar in this 
figure) emphasize global scales much more than local scales — that is, 
events that span a larger area of the cortex contribute more to the EEG 
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than events that are local in nature. Events for spherical harmonic 
number n > 12, that is, 4cm or less in spatial scales contribute minimally 
to the scalp EEG. This is supported experimentally in [31]. 


e Bringing bipolar references close together removes energy from global 
events. Furthermore, higher spherical harmonic numbers (approximately 
n > 8, or spatial scales of less than 7cm) contribute more to the EEG 
than with a distant referencing electrode. This means that bringing 
pairs of electrodes closer together improves the spatial resolution. 


The above observations are qualitatively captured in Figure 2.15(e) and 
(f), where the recording with the electrodes in (e) is affected by a greater brain 
area than the spatially closely aligned electrodes in (f). Spherical harmonic 
numbers of approximately n < 3 are unlikely to be relevant in the EEG 
because they correspond to events occurring in more than a 20cm radius of 
cortex. Events of this magnitude are not frequently observed in the brain, 
except in special cases such as epileptic seizures where the entire brain is 
recruited. In these instances, both methods of bipolar recording have very 
large contributions from global events, and show large waveforms, although 
the magnitude of these waveforms are much larger for bipolar referencing 
with distant electrodes than nearby ones. This phenomenon is observed in 
real EEG waveforms. 


The CSF, skull and scalp act as a spatial filter between cortex 
and scalp. Synchronous activity in the brain that spans large 
cortical areas is attenuated less than activity that affects smaller 
areas. Scalp recordings are influenced more by global events. 
Bipolar reference can improve spatial sensitivity. 


An important byproduct of the different transfer functions in Figure 2.17 is 
the amount of spatial sampling required to accurately represent the potential 
distribution in the head. Engineers will be familiar with the Nyquist sampling 
theorem for the temporal domain, which states that to accurately represent 
the frequency content in the signal the sampling rate must be at least twice 
that of the largest frequency present. If this is not done then the recording 
signal is aliased — a type of distortion that cannot be undone by any signal 
processing technique. The same is true in the spatial domain. 
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To accurately represent the potential distribution on the head 
the spatial sampling frequency must be at least twice that of the 
largest spatial frequency. Enough electrodes need to be placed 
on the head to give an accurate description of it. 


In the temporal domain it is easy to ensure that the sampling frequency 
is appropriate by filtering the relevant frequencies out of the signal prior to 
sampling. This cannot be done in the spatial domain. However the highly 
resistive skull ensures that small scales do not influence the EEG significantly 
and the number of electrodes required to accurately represent the potential 
distribution on the human head is still reasonable. This is not so at the 
cortex, where no such pre-filtering is supplied by nature. Dense arrays with 
electrodes every cm (if not closer) are required to accurately represent the 
potential distribution at the cortical level. 


For sufficiently sampled scalp it is possible to reduce the effects of the 
spatial averaging and the effects of the reference site that takes place by using 
the inverse spline laplacian. This is a method by which the effects of the skull 
are removed by spatial de-convolution, reducing the effects of large spatial 
scales and improving the resolution of small ones. The resulting dura image 
is then an estimate of potentials at a spatial resolution in between that of 
physical cortical recordings and scalp EEG, but recorded without the need 
for surgery. However, studies have shown that the standard 10-20 electrode 
placement system is spatially under-sampled [121], which does not mean that 
the signals are unrepresentative of the activity in the brain but does mean that 
spatial de-convolution is not possible due to spatial aliasing. High resolution 
EEG is necessary, where 64-128 electrodes are appropriate. For details on this 
type of work including limitations not mentioned here refer to, for example, 
[172]. 


2.4.2 EEG Dynamics 


This chapter has explicitly separated the concepts of EEG measurement (ap- 
proximated by a dipole sheet model in a volume conductor) and EEG source 
generation. The latter is a consequence of the dynamics of the brain dictated 
by the network configuration of interacting neurons, synaptic action and reac- 
tions and external stimulus provided by sensory information. An alternative 
approach to that presented thus far is to first develop a dynamic system model, 
calculate the potentials generated by the model and then calculate the result- 
ing fields using the volume conduction theory provided herein. Although this 
is the ultimate goal — a complete understanding of brain dynamics and po- 
tential field generation together — to date even the most developed models are 
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incapable of representing the dynamics of the brain as a whole, an issue that 
is discussed extensively in Chapter 6. 


Because dynamics can be separated from volume conducting ef- 
fects the task of explaining the EEG is greatly simplified. Volume 
conducting effects are well understood whilst brain dynamics is 
in its infancy. 


Whereas temporal dependence of typical network topology and behavior 
have been omitted in Section 2.2 they are the essence of the brain dynamics. 
This is not to say that temporal aspects are not important in EEG mea- 
surement, for example synchronous activation of many neurons is required to 
generate fields of consequence in the EEG. However it is the dynamic proper- 
ties of the system that allow temporal synchronization to eventuate. 

Phase shifts are another phenomenon that can be explained through dy- 
namics. Neither volume conducting model properties or experimental obser- 
vation have demonstrated any phase shifts (or delays) in measurements taken 
simultaneously intra-cranially and at the scalp [31]. However, phase shifts at 
different scalp locations exist [122]. Again, this is a consequence of the dynam- 
ics of the system rather than the volume conducting properties. The phase 
shifts occur due to time required for waves to propagate from the dynamics 
in one region of the brain to another. 

High frequencies (~ 1kHz) are not observed in the EEG of the cortex be- 
cause the EEG captures the dendritic response, which is much slower than the 
time variations associated with action potentials. Moreover, the EEG only 
records a meaningful response when a large number of synchronous events 
come together — the faster the phenomena, the less likely their positive rein- 
forcement 

Perhaps the most misunderstood concept is that of temporal filtering oc- 
curring between cortex and scalp EEG. There is nothing in the volume con- 
ductor model that does not allow higher frequencies to be observed at the 
scalp, provided enough of the cortex is active at that frequency (see Section 
2.3.2), yet the frequencies observed at the scalp are lower than those observed 
intra-cranially. Where does this temporal filtering occur? Again it is a con- 
sequence of the interaction between dynamics and volume conductor. The 
spatial filter between the cortex and scalp selectively transmits activity that 
is synchronous and active in a sufficiently large region of cortex. However, 
high frequency activity does not synchronize over large areas as often because 
it is more difficult for short events to phase-lock and constructively interfere. 
Hence a consequence of the spatial filter is to temporally filter the EEG be- 
tween cortex and scalp. The dynamics play a role in that they also suppress 
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high frequency behavior over large cortical areas due to propagation delays in 
axon fibers. 

EEG dynamic modeling is a very active area of research and will continue 
to be so for a long time to come. The most developed models to date (described 
in Chapter 6) are capable of describing activity in the spatial scale of the 
macrocolumn. Larger scale dynamics have been addressed but models remain 
tentative. As such, for the co-registration or validation brain dynamic models 
using EEG waveforms it does not make sense to use recordings measured at 
the scalp since it is intrinsically global in nature. Instead, research of this 
nature should concentrate on the use of intra-cranial recordings that take 
measurements in the millimeter rather than the centimeter scale. 


2.4.8 Epilepsy and the EEG 


Since our focus is on epilepsy it is instructive to determine the configurations of 
sources that could create the epileptic EEG waveforms. The trends observed 
during a seizure are synchronous activity over many channels, a sudden or 
progressive onset of this activity and oscillations that may be large or small 
(but well pronounced). Some of these observables are shown in Figure 1.10. 
The first two can only be described by the system dynamics - no mechanism 
of the volume conductor can affect the synchronicity over large regions of 
cortex, nor can the perceived sudden onset and offset of epileptic activity. 
'The resonances that result both temporally and spatially are a function of 
the neuronal network topology. These issues are addressed in more detail in 
Chapter 6. 

However, the volume conducting effects play a role in the large ampli- 
tude oscillations. What is the underlying activity of the neurons, given that 
epilepsy is accepted as a global phenomenon that affects large spatial scales? 
Large voltages at the scalp can occur for several reasons: 


1. Events covering large spatial scales contribute larger voltages to scalp 
EEG, as discussed in Section 2.4.1.4. Epilepsy is a global event. 


2. Synchronous activity contributes high voltages to the scalp EEG, as 
discussed in Section 2.2.3. The oscillatory nature of the EEG suggests 
that many neurons fire in synchrony in an on-off manner at a frequency 
characteristic of the seizure. 


3. Increased random (asynchronous) activity of EEG may also increase the 
average recorded potential. 


So what are the neurons doing during an epileptic seizure? The answer 
is probably a combination of the above. fMRI studies have shown that the 
metabolic activity increases throughout the seizure [134], although the rel- 
atively low temporal resolution of fMRI equipment makes it impossible to 
discern if this happens in an on-off nature. 


EEG Generation and Measurement 81 


The regularity of epileptic EEG waveforms as well as the fact that large 
amplitudes are recorded in both intra-cranial and scalp EEG alike suggests 
that increased activity cannot be the sole factor. The range of amplitudes 
that can occur in an epileptic seizure is more easily explained by the level of 
synchrony in the neural population. Smaller amplitude seizures simply have 
a smaller proportion of neurons firing in synchrony. This is important in a 
neural model; it is not necessary to propose that many neurons are firing, only 
that a higher proportion is synchronously active. 

It is well known that for epilepsies with gradual onset the cortical EEG 
registers epileptic activity near the focus well before any can be seen on the 
scalp EEG. A significant area of cortex must be recruited to the seizure before 
scalp recordings show a change. This may mean that scalp recordings are 
unsuitable for analysis of onset phenomena and seizure spread, particularly if 
generalization of the seizure is fast. 


2.5 Conclusions 


The aim of this chapter is to discuss what the EEG is capable of saying about 
the underlying brain activity. This has been addressed by looking at what the 
EEG measures, that is, the forward problem. 


Radially oriented, parallel current dipoles determine the mea- 
sured EEG, both at scalp and intra-cranial locations. Sources 
must be synchronously active for EEG magnitudes to be large. 


EEG predominantly observes cortical sources because the cortex is the only 
structure in the brain capable of generating similarly oriented, synchronous 
activity in a consistent manner. Dynamic properties related to the interaction 
between cortical and sub-cortical structures need to employ different recording 
strategies — no information about the sub-cortex is given in the scalp or cortical 
EEG. Furthermore, the cortical regions that affect the EEG the most are those 
on the gyrii because these are where parallel sources are most typically radially 
oriented in the head. 

The EEG is responsive to synchrony rather than activity. Low magni- 
tudes do not necessarily imply little activity. Techniques responsive to activity 
rather than synchrony include fMRI [121]. 
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'The EEG recorded at the scalp is a spatial average of a large 
area of cortical activity. A consequence of this and previous 
observations is that equal waveforms can arise from 


1. A few dipoles (current sources) near the electrode, or 
2. Many dipoles at distal locations, or 

3. A few synchronously active dipoles, or 

4. Many dipoles that are asynchronous, or 

5. A few dipoles of large magnitude, or 

6. Many dipoles of small magnitude, 


or any combination of the above. 


Hence recordings at the scalp give a very poor representation of the spatial 
resolution of the underlying brain activity, since so many different configura- 
tions can potentially generate the same waveforms. Points (1) and (2) are 
not as important for intra-cranial EEG because greater spatial resolution is 
provided. It is the intra-cranial EEG that will likely prove most useful for 
modeling applications. However, the invasiveness of the recording procedure 
means that healthy human data are unlikely to ever be available and care 
must be taken that models are not representative of a condition rather than 
normal brain activity. 

Theoretically it is always possible for a single dipole alone to generate the 
potentials recorded at the EEG, regardless of its orientation or synchronicity. 
However, the magnitude of this dipole needs to be large enough to mask the 
activity of all other sources that are synchronized and are similarly oriented. 
Research suggests that potentials of the magnitudes required to mask all other 
activity are much higher than those found in the brain [31]. It is much more 
likely that moderately low but synchronous activity over large areas are the 
major contributors to EEG. 


The scalp EEG provides an estimate of the spatially filtered 
activity in the cortex. Any perception of temporal filtering be- 
tween cortex and scalp is a consequence of this spatial filtering 
and dynamics of the brain, and not due to the volume conduc- 
tor. 
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Dynamic models based on scalp EEG should be careful to differentiate 
between global and local phenomena. The dynamic variables used should be 
appropriate for the spatial scales involved. This is not as important for the 
description of global phenomena such as epilepsy or the alpha rhythm, but 
models designed at the meso-scopic level that claim to describe the scalp EEG 
should be interpreted with care. 

Spatial resolution can be improved with high spatial sampling, as discussed 
in Section 2.4.1.4, but data of this nature are frequently unavailable. 


'The spatial resolution of EEG signals acquired with low spatial 
sampling can be improved by using bipolar recordings between 
electrodes close to one another. 


EEG traces should always be interpreted as bipolar; sources that are in 
both locations (be they synchronous, parallel, large spatial scales, etc) af- 
fect the EEG. Digital EEG has enabled re-referencing of EEG signals post- 
acquisition. The most appropriate referencing system depends on the appli- 
cation. 

We have seen that the EEG signal is limited in what it can tell us because 
it isa crude image of what is happening in the brain. This supports our 
original rough estimate, in the Preface, that 1 bit of information is gained 
every second for every 10? neurons in the brain. The spatial average removes 
much of the detail, particularly in the scalp EEG where we can at most only 
get information about every 20cm? of cortex at a time. Coupled with our 
understanding of what EEG signals measure we can safely say that the EEG 
cannot reveal much detail about the micro- or even meso-scopic activity in 
the brain. However, from a purely phenomenological point of view the EEG 
is very useful because patterns do exist in recorded traces. Signal processing 
methods that allow us to extract statistical information about the EEG signal 
are discussed next. 
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2.A Units of Electric Quantities 


Below is a list of many of the quantities related to bioelectromagnetism that 
have been used in this chapter, along with some of the units typically used to 
measure them. All units listed for each quantity are equivalent, and can be 
used to draw relationships between these quantities. 


Quantity Symbol Units 


Electric Field E Newtons per Coulomb (N/C) 
Volts per meter (V/m) 

Electric Potential ¢,® Volts (V) 
Joules per Coulomb (J/C) 
Newton-meter per Coulomb (Nm/C) 


Electric Current I Amperes (A) 

Coulombs per second (C/s) 
Resistance R Ohm (Q) 

Volts per ampere (V/A) 
Conductivity c Siemens per meter (S/m) 

Inverse electric resistance per meter (1/Qm) 
Resistivity v —1  Volt-meter per ampere (Vm/A) 
Capacitance C Farads (F) 


Seconds per ohm (S/Q) 
Coulombs per Volt (C/V) 
Permittivity € Farads per meter (F/m) 


2.B Volume Conductor Boundary Conditions 


This is a derivation of the boundary conditions in Equations 2.10 and 2.11. 

First let us look at Equation 2.10. An inhomogeneous volume conductor 
shown in Figure 2.18(a) is composed of two homogeneous materials with con- 
ductivity Omi and om2 respectively. Because each material is linear, that is, 
the electrical properties are uniform within, the effective resistance of each is 
a function of the length of the material, in this case Rẹ = AR 1. By Ohm's 
law, 


A, 2A®,, 
Im = em = Omi T and (2.19) 
AG, à 2^9, 
la = = = om 2.20 
Rm2 Miis An ( ) 
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FIGURE 2.18: Diagrams to help derive Equations 2.10 and 2.11. The bound- 
ary between two homogeneous volume conductors with conductivity om1 and 
852 is shown. In (a), the effective resistances through the media, by Ohm's 
law, are R41 = AIL and Rm2 = ap Ao, = (®g — 64) and 
A®,,2 = (c — pg). Assume currents originate from a closed circuit, not 
shown. In (b), a loop around the boundary is drawn. By Kirchoff's law 
the potentials around this loop sum to zero. Here Ao,; = (x — $4), 
Am1 = ($4 = $5), Ano = (Pp, = $5), and A® 2 = ($4 = p). Refer 
to Appendix 2.B for derivations. 


where AO, = ($5 = a) and AO, = (®o E Pp). 
As a consequence of conservation of charge (charge cannot appear or dis- 
appear) the currents flowing across the boundary must be equal. Thus, 


Im = Im (2.21) 
APmı AO,» 

m = m à 2.22 

Tu An TIN ^n ( ) 


By taking the limit as An > 0 
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0$, 09,5 
Oml ðn = Om2 ðn , 


(2.23) 


as in Equation 2.10. 

For Equation 2.11 refer to Figure 2.18(b) where a path (loop) is drawn 
across the boundary. This loop has length An and width At. By the principle 
of conservation of energy Kirchoff's voltage law states that the voltage sum 
around any loop in an electrical circuit is equal to zero. Thus, 


Anı t AO, Ao,» AO,» = 0, (2.24) 


where AO, = (x — $ 4), AO, = (By = $5), AO,» = (Pg, = $5) and 
A®,,2 = ($4; — Pp), as shown. 

To observe the change of potential with respect to At, the tangent, this 
can be re-written as 


AO, i Amı AO,» AO,» = 

At ^ AK At AC CU ee) 
Next take the limit At — 0. By symmetry and conservation of charge 

Ini = —In2, resistances of both paths normal to the boundary are the same 

and thus A®,, = A®,2. Terms 1 and 3 in the above equation cancel. As a 


consequence, 


: AGO, u s Am2 
Ao At T Ao At ue 
00,4 09,4,» 
— 2.21 
ot Ot ' eeu 


as in Equation 2.11. 


2.C Capacitance in RC Circuits 


This is a worked example of the conditions under which capacitive effects in 
a circuit such as that shown in Figure 2.13 can be ignored. This relates to 
the electrical properties of biological tissue discussed in Section 2.3.2. Basic 
knowledge of circuit analysis is assumed in this example. 

In Figure 2.13 it is shown that the frequency dependent capacitance is 
given by C(f) = «C and frequency dependent resistance by R(f) — zt 
where L is the linear distance from a point A to point B in the tissue and f is 
the frequency of interest. Ignoring frequency dependence of R and C for the 
moment, the equivalent impedance Z of the circuit in Figure 2.13 is given by 


1 R 


Z=R\|| —--——- 
R| sC  l-csRC' 


(2.28) 
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where s = 27 fj with j = V —1 and || denotes that R and C are in parallel. 
The voltage between points A and B is, by Ohm's law, 


RI 
loc sRC' 


where J is the current passing through the medium, shown as a current 
source in Figure 2.13. This voltage can be re-written as 


Vga = ZI = (2.29) 


l4 sRC — sRC 
Vea f ( l4 sRC ) uen) 
sRC 


This says that voltage Vga is approximately equal to a purely resistive 
circuit (Vag ~% RI) if the magnitude of this second term lll is small. 
This quantity is thus representative of the percentage error in Vag when the 
assumption that the medium is purely resistive is made. 

To make this error small it is sufficient to make ||sRC|| = ||2r f RC j|| and 


consequently 27 f RC small. Substituting R — s and C — an results in the 
condition that 27 fe(f) < o(f) be true if capacitive effects are to be ignored. 


3 


Signal Processing in EEG Analysis 


“Tt is easy to lie with statistics. It is hard to tell the truth without 
it." 


- Andrejs Dunkels, Swedish mathematics teacher (1939-1998) 


The EEG is a collection of recorded signals that represent the electrical 
activity in the brain. From Chapter 2 we have an idea about how these signals 
are generated and how they are recorded. We have seen examples of what they 
look like, and we have seen that they can be used to tell us something about 
what is happening in the brain system. This chapter presents some of the 
mathematical tools available to interpret these records, that is, it explores the 
signal processing commonly used to analyze EEG records. 

Signal processing is a way in which the EEG record can be converted 
into a numerical description of features in the data. For example, numbers 
that describe the energy, frequency content or complexity of a signal can be 
computed. These features are used to unmask information that is not visually 
obvious, or to compress vast amounts of data to a more manageable level of 
selected information. Compression is important for EEG analysis because 
many hours of recording lead to gigabytes of data. 

Often the extracted features are better viewed as statistics. Formally, a 
statistic is defined as “a fact or piece of data obtained from the study of a large 
quantity of numerical data" [124]. Statistical analysis is easily (and therefore 
frequently) unintentionally misused, misinterpreted and misreported. It is im- 
portant as researchers to strive for correct implementation and interpretation 
— all statistical analysis techniques are limited, and these limitations must be 
understood and reported. It is often just as important to know what a statis- 
tic cannot say as well as what it can say. This depends on the context of the 
signal, in this case the fact that the signal is an EEG. 

The aims of this chapter are (1) to present common signal processing tools, 
and (2) to outline their limitations given that we are using EEG signals to 
interpret the brain system. Although the methods presented are applicable 
to a broad spectrum of problems, signal processing techniques relevant to 
epileptic seizure detection and prediction are the main focus. A broader, less 
specific survey of EEG signal analysis can be found in, for example, [118] or 
[181]. 

Both detection and prediction of epileptic seizures are examples of clas- 
sification problems. The former classifies between two states, seizure versus 
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non-seizure, whereas the latter includes a third pre-seizure state indicative of 
an imminent epileptic event. Neither seizure detection nor seizure prediction 
are new problems. Simple seizure detection methods have been around for 
decades, although their sophistication increased significantly in the age of the 
digital EEG. The original analog amplitude detectors have evolved into com- 
plicated systems involving many levels of computation. Research in seizure 
detection remains relevant because of the search for ‘optimal’ performance that 
can emulate that of a human expert. The first attempt at seizure prediction 
using EEG can be traced back to the 1970s [70], although poor performance 
meant that efforts were quickly abandoned. In the 1980s spatial and temporal 
patterns of interseizure events were analyzed. The study of seizure predic- 
tion really flourished in the 1990s, in particular through the use of non-linear 
systems theory. 

A complete detection and prediction system can be divided into the four 
stages outlined in Figure 4.1, namely preprocessing, feature extraction, clas- 
sification and expert system. This chapter focuses on signal processing of 
EEG signals, which is most relevant to the first two. However the selection of 
the type of signal processing should be aligned with the remaining stages of 
classification and expert systems. These are discussed separately in Chapter 
4. 

Preprocessing is the process in which the EEG is prepared for analysis. 
'The signal processing in this area involves the removal of unwanted aspects, 
such as artifact and high frequency content, and normalizing the EEG data so 
that it is comparable to all other data (e.g., normalize the amplitude range, 
sampling frequency, etc). Preprocessing is discussed in Section 3.2. 

Feature extraction is the process whereby the relevant statistics or features 
are extracted from the EEG. In the case of seizure detection and prediction, 
relevant information refers to features that are capable of distinguishing be- 
tween non-seizure, pre-seizure and seizure states. This is perhaps the most im- 
portant component of the classifier given that without appropriate extraction 
of features the classifier cannot perform well. Feature extraction is discussed 
in Section 3.3. 

For seizure detection and prediction it is essential that the extracted fea- 
tures characterize the temporal evolution of the EEG so that changes can be 
detected at the appropriate time scale!. This brings us to a fundamental 
problem in the design of a detection system: we want to capture change, but 
change is bad because most signal processing techniques require stationarity, 
that is, they require that the statistical distribution of the signal (such as that 
shown in Figure 3.2) remains constant over time. 

Any EEG is an observation from an intrinsically non-stationary system. 
Simply looking at the very different morphologies of the signals during ‘sleep’ 
and ‘awake’ shown in Figure 1.9 highlights this point. The question is then 
how easy is it to detect these changes? How fast can we conclude that a 


1An appropriate time scale may be seconds for detection and minutes for prediction. 
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transition has taken place? If an observation window of a certain duration is 
required for the correct computation of a statistic, but the EEG changes at 
a rate that is much faster than that, does the statistic become useless? How 
must results be interpreted? Short enough windows are necessary so that the 
EEG can be assumed to remain roughly stationary and temporal resolution 
remains high?, yet they must also be large enough so that the signal processing 
techniques are useful. These trade-offs are discussed throughout. 

To avoid re-writing signal processing books a certain level of mathematical 
knowledge is assumed of the reader, although an effort is made to maintain 
this at a minimum and the explanations given are as self contained as possible. 
For someone not so familiar with the relevant mathematics but interested in 
learning more, the standard textbooks [58], [105] and [86] may be useful. 
Otherwise reading the introductions to each section, summaries and boxed 
comments should be sufficient to obtain a conceptual understanding. 

Before delving into descriptions at the different stages of a classification 
system, the EEG is introduced as a mathematical construct where the con- 
ventions, notations and assumptions used throughout the text are presented. 


3.1 Mathematical Representation of the EEG 


An EEG is a set of recordings taken from C'ror electrodes that can be placed 
on the scalp, the surface of the cortex or deeper within the brain. In this 
text a continuous time signal x,(t) is used to denote the temporal evolution 
of voltages contributing to each location, where t represents time and c = 
1..Cror is the electrode/channel number. These are the true signals generated 
by the brain system. 

The digital EEG approximates the continuous time signal r.(t) by sam- 
pling it at discrete time sampling points, spread at an interval A. That is 


ze[n] =a-(nA), n=1,2,3---. (3.1) 


Here x,[n] is the recorded discrete time signal that samples z.(t) at time in- 
tervals t = A,2A,--- ,nA. The discrete time is enumerated as n = 1,2,3---. 
Once digitized we only have access to x,[n], the assumption being that the 
sampling interval is small enough (alternatively, a large enough sampling fre- 
quency is used) so that z.[n] closely represents the relevant properties of xe(t). 
Aspects of this are discussed in Section 3.2. 

In Chapter 2 we saw that x«(t) and consequently xe[n] are affected by ac- 
tivity in networks of hundreds of thousands of neurons. The behavior of these 
networks over time, known as the dynamics of the brain, can be described by a 
set of mathematical equations expressed in terms of ‘hidden’ variables known 


?Here we are talking about time scales much larger than those presented in Chapter 2. 
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as the (dynamic) state, here denoted as z[n]. The dimension of the state is 
assumed finite?, say Ñ. That is, we assume z[n] € R, where RY is the phase 
space of the system. The time series x,[n] is a single dimensional recording 
that is not able to describe z[n] in its entirety without further manipulation. 
Methods that allow the reconstruction of at least some of the properties of 
z[n] given the single time series z.[n] exist. This is discussed in more detail 
in Section 3.3.4. 

First let us look at how the relationship between x,[n] and z[n] may be 
conceived. A discrete-time system can be described by [159, 183] 


B-—«zc[n, n= 1,2,3.. z[n + 1] = P(z[n], &[n], u[n], n) . (3.2) 


The above may define the behavior B of a single EEG channel x,[n] at 
times n = 1,2,3... This description informs us that x,[n] is generated by 
a dynamical system with input u[n], parameters «[n] and state z[n]. The 
relationship between these quantities is determined by maps P and Be. P 
is the state transition map and B, is the output map for channel index c. 
'The input is an unconstrained signal that in the brain could represent sensory 
information (e.g., vision, hearing), could be set to zero or could be a ‘white 
noise! signal when describing endogenous activity. 

The map P : RN — RN describes how to get from the current state at time 
n to the next state z[n + 1] at time n 4- 1. A time series z[n] for n = 1, 2,3.. is 
generated by repeated applications of the map P beginning with some starting 
condition z[0]. This time series is known as an orbit or trajectory of the system 
[159]. 

If the map P can be written as a constant matrix, then Equation 3.2 is a 
linear system, otherwise it is non-linear. If Equation 3.2 is used to describe 
a completely stochastic system then z[n] defines the distribution of the state 
of the system, and P is used to describe how this distribution changes over 
time. 

P depends on a set of parameters (is[n]) representative of physiology (e.g., 
network topology, chemical concentrations), as well as inputs to the system 
ujn] (e.g., visual and auditory stimuli) that together determine the behavior 
of the system. The parameters, inputs and the map itself can all change 
over time and are functions of time n. Stochastic or random processes can be 
incorporated into Equation 3.2 via u[n] or as a property of P itself. This, along 
with the nature and assumptions made about P, x[n] and u[n], is discussed in 


3More realistically, given that a lot of the phenomena in the brain are related to transport 
of chemicals, heat, energy or electrical signals, the state should be more appropriately 
represented as infinite dimensional. 
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further detail in Chapter 6. For this chapter it is sufficient to speak about the 
map P as an arbitrary (but appropriate) description of the dynamics, from 
which measurements can be taken. Here, P can be viewed as the dynamical 
system called the brain. 

The measured output or observable taken from the system z[n], in our 
case the EEG signal x,[n], is also defined in Equation 3.2. The signal is 
derived from z[n] by the map Be : RN — R. Here c is the channel index 
that corresponds to a particular recording location on the head. This map 
largely depends on the geometry and the electrical properties of the materials 
in the head, and as described in Chapter 2 is, unlike P, assumed stationary 
and independent of inputs u[n]. 


In this chapter a single recorded EEG channel is represented as 
a time signal ze[n]. n is the time index representing zc(nA), 
which samples a real signal xe(t) using a sampling interval A. 
c = 1,2..Cror is an index representing one of the Cror EEG 
channel locations. 


zeln] is a single discrete time measurement taken from the 
behavior of the brain, generated by a complex dynamical sys- 
tem with state z[n] that is N dimensional. Mathematically the 
behavior B of the resultant z.[n] can be expressed in terms of 
the dynamics of z[n] as in Equation 3.2. 


The mathematical conventions introduced here are used throughout the 
book. However in this chapter it is only necessary to acknowledge that a brain 
system described by z[n] exists, that an EEG signal x.[n] is an observation 
taken from this high dimensional system and that signal processing is used as 
an attempt to extract properties of z[n] from xe[n]. More than one channel 
(Cror > 1) may be analyzed at a time. 


3.2 Preprocessing 


Preprocessing in a classification system involves the preparation of the raw 
EEG data, z.[n], so that it is ready for feature extraction. This requires (1) 
making sure that the data are appropriately sampled, referenced and filtered, 
(2) normalizing the data so that it complies to a particular standard and (3) 
dealing with unwanted artifact. 

Appropriate sampling is dealt with by the recording system (such as that 
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described in Chapter 5) at the time of acquisition. This involves abiding by 
the Nyquist criterion which states that a sampling rate of F; Hz can only 
correctly represent signals that contain frequencies up to F,/2. If frequencies 
higher than F,/2 are present then distortion known as aliasing occurs. The 
effects of aliasing cannot be removed by any form of signal processing. This is 
demonstrated in Section 3.3.2. 

If the true signal x(t) contains frequencies higher than F,/2 prior to sam- 
pling these components must be removed. This is known as pre-filtering of 
the signal. In practice a pre-filter removes frequencies smaller than F’,/2.5 to 
avoid edge effects [122]. Re-sampling of the raw signal to a new (smaller) Fs 
also requires pre-filtering. 

Normalization here refers to the process by which data are converted to 
a form that can be compared to other data that have been acquired using 
different recording equipment or taken from different people. For example, 
consider two sets of EEG with similar phenomena but acquired using two 
separate recording systems. System 1 encodes the recorded signal in the range 
[0,10] and system 2 in the range [—20,20]). The amplitudes of these two 
measurements are not directly comparable. However, if prior to comparison 
data are normalized to a common range (e.g., [0, 1]) then both signals may be 
compared directly. 

A standard way to normalize a signal is to 


1. Remove its mean: The EEG is a measure of relative voltage, and as such 
it is a quantity that can only be defined up to a constant (see Chapter 2). 
'The mean of a single recording does not provide useful information and 
can interfere with analysis. Therefore it is best to detrend the signal by 
removing its mean prior to analysis. If there are multiple simultaneous 
recordings then all channels should be detrended by the same constant. 
'This preserves their relative means, which may contain information. It 
is implicit when bipolar referencing is applied. 


2. Scale to unit variance: To normalize a detrended signal to its unit vari- 
ance it must be divided by its standard deviation, making the resulting 
signal scale invariant. Indeed the size of the signals in an EEG record 
is meaningless, as it depends on the gain of the measurement amplifier. 
'This means that records measured on different equipment are compara- 
ble to one another. Both variance and standard deviation are defined in 
Section 3.3.1. 


Normalization is an important process that applies both the raw data (in 
the preprocessing stage) and processed data (in the feature extraction stage). 
It is discussed in more detail in Section 3.3. 

Once the EEG signal is acquired and normalized, the referencing scheme 
used can affect the localization of EEG activity, as discussed in Chapter 2. 
For the purpose of seizure detection and prediction bipolar referencing is most 
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often used to improve spatial resolution, that is, so as to emphasize the differ- 
ences in activity between electrodes. Differences in activity at different sites 
are important for the identification of seizure activity. Re-referencing can be 
applied post-acquisition. 

Finally, artifact must be dealt with. Several strategies exist to deal with 
the types of artifact outlined in Chapter 1. These include: 


1. Ignoring: Assume that the feature extraction methods are only mini- 
mally affected by the artifact. 


2. Rejecting: Artifact is (automatically) identified and channels or epochs 
that are contaminated are excluded from analysis. 


3. Removing: Artifact is again identified and if possible removed from 
the EEG signal by separation methods such as independent component 
analysis (ICA) [69] or wavelet filtering (described later). Less data is 
lost than through rejection. 


4. Training: The system is trained to identify and cope with common 
artifact. This shifts the responsibility over to the Classification and 
Expert system described in Chapter 4. 


Seizure detection and prediction algorithms often employ all of the above. 
Most algorithms deal with simple artifact such as electromagnetic interference 
using strategy 3 — the frequencies at which interference occurs are removed 
from the signal. Weak artifacts such as that caused by heart activity are 
ignored because their influence is assumed small. 

Ocular artifacts (EOG) are in most cases ignored because they are as- 
sumed to have a minimal effect on signal processing methods. Although this 
assumption is not necessarily true, it is convenient because rejection results 
in the loss of large amounts of data (ocular artifacts occur very often) and 
removal (e.g., by using ICA, as discussed in [78] and [79]) is computationally 
cumbersome. At the most a classifier may be trained to expect interference 
from EOG. 

Muscle artifacts (EMG) are also very common, particularly in scalp EEG 
because there are muscles close to electrodes. Rejection is thus not an option. 
Removal is a better solution but is a difficult task because EMG frequencies 
overlap with normal and seizure EEG frequencies, particularly but not ex- 
clusively in the 15-20Hz range [55, 128]. The EMG separation process uses 
wavelet filters (described in Section 3.3.3) rather than conventional filters be- 
cause these avoid distortion effects [152, 151]. Epochs of heavy muscle artifact 
may be rejected altogether. 

'The more sophisticated detection algorithms employ strategy 4 and selec- 
tively train the system to cope with artifact (see [170] and [156] as examples). 
This training typically includes examples of common artifact as well as al- 
pha rhythms that although not artifactual in nature are often responsible for 
incorrect classification of the EEG. 
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3.3 Feature Extraction 


Feature extraction is key to the performance of a classifier. Because the field 
of signal processing is enormous the theory presented here is restricted to what 
is most relevant to EEG analysis and epilepsy in particular. Methods valid 
only for long term studies are omitted — epileptic events are short and it is 
their temporal evolution that must be represented. 


The problem under investigation in this text is: what signal 
processing is able to extract a feature Çs from x,[n] that can 
be used to differentiate between seizure, non-seizure and, in the 
case of prediction, pre-seizure EEG? 


The choice of an appropriate discriminating feature requires a deep un- 
derstanding of the problem [49]. 


Multiple features can be used together, in which case the classi- 
fier has access to a feature set (C1, C2,--: , Cs). More features are 
not necessarily better — each ¢, must introduce additional infor- 
mation to improve the discriminating power of the feature set. 
Spurious addition of features can confuse results, degrade per- 
formance and prohibit efficient computation as shown in [185]. 


Before delving into the specifics of feature computation several general 
concerns particular to the EEG are addressed. These are discussed below. 


3.3.0.1 Computing Statistics: Averages vs. Instances 


Statistics are computed from large quantities of data because any one instance 
of the raw data can vary significantly. For example, no two EEG sequences 
will ever look the same. Even the EEG of the same person, doing the same 
thing, but recorded at a different time will be different. However there is an 
expectation that if the person is doing the same thing then the EEGs will 
indeed look similar. We postulate that our observations are drawn from a 
large ensemble of possible observations, and that this ensemble has some well 
defined features allowing us to infer the probability of a certain observation. 
More often than not we also postulate that observing a signal over a long 
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FIGURE 3.1: To observe the sinusoid in (a) it is necessary to look at average 
behavior over a suitable time frame, particularly so when noise is present. If 
only a few samples are available the sinusoidal pattern is not evident. When 
computing statistics the length of the analysis window is also important, as 
shown in (b), where the statistic of interest is the mean. Using only a few 
samples the estimated mean is nowhere near the real value. Only when the 
analysis window is longer does the estimate approach the real mean. In this 
figure the sampling frequency is F; = 512Hz, so that 1 second is equivalent to 
N — 512 samples. 


period of time will allow us to infer the properties of the ensemble. This is 
called ergodicity. 

We can see this with two examples. First refer to Figure 3.1(a) where a 
stand-alone sinusoid is plotted. Any one sample or instance in time isolated 
from all other samples does not tell us much — it could belong to any signal. 
However if observations are made in the context of a larger timeframe the 
sinusoid becomes evident. This is true also when noise is added, as shown. 
On average this noisy signal is a sinusoid. 

Next examine Figure 3.2 where two signals, different to one another, are 
plotted. Even though their raw values are not the same their empirical dis- 
tribution of amplitude is very similar. Empirical distribution is computed by 
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FIGURE 3.2: Two signals in (a) and (b) are different because their raw values 
are not the same. However when we look at the probability distribution func- 
tion (PDF) of each signal we see that their distributions are roughly similar. 
The two signals are said to be samples or instances of data sequences that 
belong to the same statistical distribution, on average. The PDF is computed 
by summing the number of samples that fall within each marked bin, and 
dividing by the total number of samples in the signal. Thus it is a measure of 
the relative frequency of occurrence. The process is explained later in Section 
3.3.4, Equation 3.42. In this figure the sampling frequency F, = 512Hz, so 
that 1 second is equivalent to N = 512 samples. 


counting the relative number of samples that lie within one of the amplitude 
bins shown. This is known as the probability distribution function or PDF. 
The two signals are said to be samples or instances of data sequences that are 
drawn from an ensemble with a particular statistical distribution. 


To say anything meaningful about a signal or system we study 
averages. These averages can be taken over time, over space or 
over a distribution. 


However we are rarely in a position to compute averages precisely as we 
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do not know the ‘true’ distribution of the signal, or worse, there may not be 
a ‘true’ distribution. It is then very important to consider how we actually 
compute a statistic. How many data points were included? Did we make any 
assumptions about measurement errors and the environment? Is the instance 
from which the feature is extracted representative of the overall behavior? The 
interpretation of a statistic must consider the specifics of the calculation. Refer 
to the example shown in Figure 3.1(b), where the statistic of interest is the 
mean amplitude of a signal. Here the estimated mean is shown when different 
observation times or window lengths are used. The longer the observation 
time, the closer the estimated statistic reflects the true value. However if 
we do not know the true mean of the signal then we cannot know that the 
feature was extracted using sufficiently long observation time. What is the 
significance of a computed mean if it is not known that the computation is 
correct? 


3.3.0.2 Noise 


Any measurement can be expressed as the sum of the ‘true signal’ plus some 
noise. In the case of the EEG signal this is 


ze[n] = a-(nA)+n7(nA), n=1,2,3--- (3.3) 


where x-(nA) is the value of the true signal x(t) at time t = nA (where 
A is the sampling interval), and 7(nA) is the error of this measurement at 
this time. Here 7(nA) is known as the measurement noise because it is the 
quantity by which the recorded signal x,[n] differs from the true signal zc(nA). 
These differences arise from errors in the acquisition process, which cannot be 
removed, or from artifact — the part of the measured EEG that is not caused 
by the neural activity in the brain. If 7(nA) is small then x,[n] ~ ze(nA). 

In practice we have the freedom to determine what we call signal z.(nA) 
and what we call noise 7(nA). For example, consider once more a noisy 
sinusoid similar to that shown in Figure 3.1(a). If for the purposes of analysis 
we are only interested in frequencies below 3Hz we can assign anything higher 
than this frequency to noise. The signal z«[n] can be decomposed as in Figure 
3.3(a), and g(nAX) can be discarded prior to analysis. Similarly, imagine a 
situation where absolute amplitudes larger than 0.75 are of no interest. In 
this case x,[n] can be decomposed as in Figure 3.3(b). Noise and signal are 
in the eye of the beholder and they depend on the application. 

Another type of error is that caused by manipulation of x,[n]. When a 
statistic is estimated using sub-optimal conditions, and this estimate deviates 
from the true value, this is known as computational noise. Computational 
noise arises from inappropriate representation of data at the signal processing 
stage, for example, when insufficient data are used, when modeling errors are 
introduced or when the data are non-stationary and/or non-linear. Compu- 
tational noise places a limitation on how a statistic may be interpreted. 
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FIGURE 3.3: Noise is a quantity that can, within limits, be arbitrarily defined. 
Shown above is an example where the recorded signal x,[n] is decomposed 
into real signal xe(n) and some noise 7(nA). In (a) n(nA) consists of all 
frequencies above 3Hz, whilst in (b) it is when absolute amplitudes are above 
0.75. Once the type of noise is defined it can be removed from a signal prior 
to analysis. Whilst the case in (a) is more realistic in practice there is no 
theoretical reason why noise such as that in (b) cannot be used. In this figure 
the sampling frequency Fs = 512Hz, so that 1 second is equivalent to N = 512 
samples. 


Measurement and computational noise both result in errors in 
an estimated statistic. 


3.3.0.3 Stationarity and Windowing 


Most mathematical tools assume stationarity in a signal, that is, they assume 
that the information in ze[n] (both statistical and dynamical) remains the 
same under arbitrary time shifts. However the EEG is indisputably inherently 
not stationary: differences in EEG time series between sleep, wakefulness, 
eyes open, eyes closed, etc. are evident. The very attempt to differentiate 
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FIGURE 3.4: An example of using non-overlapping one-second windows is 
seen in (a). In (b) the windows are shown to overlap by 0.5 seconds, thus 
the first window has k = 1 and the second k = 1 + N/2. In this figure the 
sampling frequency F, = 512Hz, so that 1 second is equivalent to N = 512 
samples. 


between non-seizure, pre-seizure and seizure states at different times implies 
non-stationarity [37]. 

To reflect changes over time a signal x,[n] is windowed by dividing the 
time axis into sections that may or may not overlap. Examples of overlapping 
and non-overlapping windows are in Figure 3.4. 

'The most often used window is the rectangular window, defined as 


if k+1l<n<k+N 
Hi[n] = { 0, otherwise : py 


for a window of length N, starting at time sample n = k + 1. Windowing 
is applied by multiplying the signal with this function, that is, z«[n] Hy [n]. 
Between times k+1 < n < k-- N the signal z«[n] remains unchanged. Outside 
this range the windowed signal is zero. This is shown in Figure 3.5(a), where 
the dotted line is the original signal and the solid line the resultant windowed 
signal. 
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FIGURE 3.5: The effects of applying a window over an analysis window. In (a) 
a rectangular window defined in Equation 3.4 is applied to the original signal 
by multiplying the two together. In (b) the edge effects are less pronounced 
using a Hanning window defined in Equation 3.5. In both cases all values 
outside the analysis window are zero. In this figure the sampling frequency 
F; = 512Hz, so that 1 second is equivalent to N = 512 samples. 


The problem of rectangular windows is that the sharp edges of the rect- 
angle can affect analysis. Different types of windows exist to deal with this 
problem, a common one being the Hanning window defined as 


0.5 (1 cos (g=) ) , if esposo pev (3.5) 


0, otherwise 


An example of this is shown in Figure 3.5(b). 

Windowing effectively enforces an (artificial) stationarity by making the 
local signal within it globally valid in analysis. Of course this means that 
no information about signals outside the window is used. A rectangular win- 
dow does not guarantee continuity at the edges, but other windows (like the 
Hanning) do. This goes a long way toward explaining the reason for the 
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use of Hanning windows, in particular if Fourier analysis is used, where the 
assumption is that the data within the window is being repeated outside of it. 

We want to make windows short because stationarity in x,[n] cannot be 
assumed when long windows are used. However N must also be long enough 
so that the computed statistic is informative (which for some methods, partic- 
ularly those presented in Section 3.3.4, may be very long). In practice rather 
than restricting analysis to tools only valid for non-stationary signals (a mi- 
nuscule number compared to those designed for stationary signals) the issue is 
pragmatically resolved by assuming that windows of up to 20-30 seconds are 
weakly stationary or almost-stationary [37]. This is an approximation, and 
even on this time scale non-stationary behavior can be observed; for example 
short abrupt bursts are very common in the EEG. 


'The assumption of weak stationarity over windows of 20-30 sec- 
onds in length of EEG data underpins most signal processing 
tools used to extract features from EEG sequences. The compu- 
tational noise introduced by non-stationarities in these windows 
is assumed to be negligible. 


To analyze a signal over time it is sufficient to vary the time k at which the 
window is applied. Successive values of k can be chosen so that consecutive 
windows of length N overlap by N — k samples. In some cases it is also possible 
to vary the length N of each successive window. 


3.3.0.4 Linearity, Non-Linearity, Determinism and Stochasticity 


The signal xe[n] can be generated by a system that is linear, non-linear, or 
stochastic. The difference between a linear and a non-linear system is that 
the former preserves the operations of addition and scalar multiplication. The 
behavior B is linear if for any two trajectories ze [n], zc»[n] that belong to B, 
(%ei[n], xe2[n] € B), then for all real constants a and b 


azaln] + 6x.» [n] € B. (3.6) 


That is, it is linear if the scaled sum of any two trajectories also belongs to 
the behavior B. If Equation 3.6 does not hold then the system is non-linear. 
It is safe to assume that the EEG signal is an observation from a non-linear 
system. 

To analyze the signal generated by a complex system we have linear as 
well as non-linear tools. Linear tools can be applied to signals generated 
by non-linear systems, effectively modeling the non-linearities as a stochastic 
component. However linear tools are limited in what they can tell us and 
this approximation may miss information resulting from the non-linearities. 
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Non-linear signal processing tools are more effective in dealing with this type 
of data, at the expense of greater computational cost, and typically requiring 
stationarity over longer windows (more than 20-30 seconds). 

Stochastic elements are often assumed to arise from noise and artifact 
introduced by measurement, although unknown processes within the brain are 
also described as stochastic in models of the generating systems (see Chapter 
6). Stochasticity is dealt with by estimating its effect on a particular signal 
processing tool. 

By far many more linear signal processing tools exist than non-linear ones. 
Both typically assume stationarity of the signal, although some methods exist 
that do not. Linear methods are described in Section 3.3.1, Section 3.3.2 and 
Section 3.3.3. Non-linear analysis tools are described in Section 3.3.4. 


Linear signal processing tools are prevalent, and far better un- 
derstood than non-linear ones. Non-linear tools typically require 
more data, in which case the stationarity, quantity and quality 
of the data must be taken into account. 


3.3.0.5 Normalization 


Normalization applies both to the raw data (usually in the preprocessing 
stage, described in Section 3.2) and to the computed features. A features C, 
presented to a classifier must be normalized so that it can be compared to the 
same feature extracted from any other data of the same nature. Normalization 
depends on the type and length of the window used. 

Often it is possible to incorporate the normalization in the process of es- 
timating the statistic. In cases where this is not so (because no appropriate 
normalization exists) a normalized comparison can be achieved by using rela- 
tive rather than absolute measures, for example, comparing a feature relative 
to background activity. Normalization is imperative to the analysis of a sys- 
tem such as the EEG in which signals vary so significantly between states of 
awareness and between different people. The features discussed in this section 
are based on normalized data sequences unless otherwise stated. 


Relative and normalized statistics are important in a recording 
environment where large variation in signals are observed at dif- 
ferent times or between different records, and where the length 
of observation time or window length needed is not always ob- 
vious. 
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To summarize, an EEG signal z.[n] is a noisy measurement of 
brain activity. 


Features extracted from data using linear signal processing 
techniques may not be accurate if z.[n] has a non-stationary or 
non-linear origin. 


For practical reasons analysis of EEG often assumes weak 
stationarity in data windows of less than 30 seconds. Non-linear 
signal processing requires even longer windows of stationarity to 
be valid. 


Normalization is imperative for the interpretation and com- 
parison of data. 


'The remainder of this section discusses feature extraction methods relevant 
to the analysis of the epileptic EEG. The tools presented are applicable to any 
arbitrary discrete time signal. 


We define the signal y[n] to be an arbitrary time signal. During 
analysis the signal is windowed using a function Hi; [n] of length 
N. This window is typically rectangular and is incorporated 
implicitly as part of the computation. This means that in EEG 
analysis a windowed signal is denoted yi [n] = z«[n] Hy [n]. Time- 
evolution is achieved by varying the time k at which the window 
is applied. 


For brevity theory is limited to discrete time analysis. Analogous results 
for continuous time signals exist and can be found in generic signal processing 
books such as [64], [138] and [143]. 


3.3.1 Time Domain Analysis 


A signal y[n] is a function of time n. It is defined within the observation 
window that depends on time. It can take values in a certain range, known as 
the measurement range. Estimating features that depend on time is known as 
time domain analysis. For the classification of the epileptic EEG time domain 
analysis is often used to give a numerical representation to qualitative visual 
observations, for example the increase in amplitude, increase in regularity 
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and increase in synchronicity observed during epileptic events. Statistics that 
characterize these observables are described in this section. 


3.3.1.1 Signal Amplitude (Energy) and Variance (Power) 


A signal's instantaneous amplitude at time n is given by |y[n]|, where |.| is 
the magnitude. We will call this quantity the signal's energy. Its square, 
lu[n]|?, we call the signal’s power. Both power and energy give an idea of the 
magnitude of y[n] at time n. Because of the squared term power emphasizes 
changes more than energy but is consequently more affected by noise. y[n], 
|y[n]| and |y[n]|? are all time domain signals because these types of transfor- 
mations preserve dependence on time. 

Instantaneous energy is rarely able to say anything about the waveform 
that cannot already be observed in the original signal. Averages over time 
are more useful because emphasis is placed on mean behavior — recall that 
estimates of statistics are only meaningful in ensembles rather than single 
instances. The mean of a signal y[n] is estimated as 


1 RAN 
nk = a . vnl (3.7) 
n=k+1 


fy [Kk] is the mean of sequence y[n] of length N starting at time k. Here y[n] 
can be the original recorded signal or can be replaced by any other signal such 
as its energy |y[n]|, its power |y[n]|? or another transformation altogether. If 
y[n] is stationary with real mean ji, then the larger the value of N, that is, 
the more samples used, the closer that the estimate j1,[k] is to the real mean 
Êy. This can be observed in Figure 3.1. 

The variance of a signal y[n] is a statistical quantity that gives an idea 
of its spread and regularity by computing how much, on average, it deviates 
from its mean. An unbiased estimator of the variance of y[n] is defined as 


k+N 
elk = xA 3. Orl- [E (3.8) 
n=k+1 


where o7[k] is the variance of a sequence y[n] of length N and pyfk] is 


the mean as calculated by Equation 3.7. The square root of variance o, [k] is 
known as the standard deviation. 

Sometimes it is necessary to compare statistics extracted from signals that 
have significantly different mean values. In such cases a new statistic can be 
defined where the variance of a signal is normalized by its mean as 


— y [k] 
pk] 


Here c,[k] is known as the coefficient of variation (COV). This normal- 
ization is not ideal when p[k] is close to zero because c,[k] becomes very 


cy|k] 


l (3.9) 
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sensitive to changes in o,|k]. This limits the usefulness of COV to non-zero 
mean signals such as |y[n]| or |y[n]?. 

Short term non-stationarities (e.g., transient bursts) can affect the estimate 
of i, [k], and thus COV and variance, significantly. If we want to avoid this 
it is possible to talk instead about the variability of a signal rather than its 
variance. One way to describe variability is to count the number of times 
a signal changes polarity. A statistic of this nature is total variation vy|k] 
defined over the analysis window as [110] 


203 Xs luin] -yh — 1| 
vy[k] = (3.10) 
N-—1 (maxyjx] — mini) 


It is defined only for non-constant signals y[n] with maximum value maxy;j 
and minimum value min,p,j over the analysis window, defined as 


minyg = min (unl) (3.11) 


maXyk| = a eX, [n]. (3.12) 


The dividing term (maxyjx] — miny[;]) included in Equation 3.10 normalizes 
results by the range of y[n], thus making it comparable to estimates computed 
at other times. vy[k] takes on values between =y and 1, that is, vy[k] € 
[.- 1]. Slow, smooth signals have low total variation, with the lower bound 
of vy[k] = 4L achieved by functions that change monotonically between 
minj] and max, over the analysis window (n =k+1,k+2,---,k+N). 
Fast, large oscillations increase total variation with vy[k] < 1 and the upper 
bound achieved when y[n] alternates between max,p; and minygj at each 
consecutive n. 

Let us now see how ji, [k], o7 [k], c, [A] and v, [k] can be used for the purposes 
of detecting changes over time. Tracking of these quantities requires the choice 
of window length N used to compute each statistic, which can be problematic 
for mean, variance and COV. Long windows give a better estimate of long- 
term behavior, but short events are averaged out and cannot be identified. 
Conversely, short windows can provide better temporal resolution and detect 
short events, but good (or representative) estimates are not obtained. Total 
variation is more suitable for short windows of data. Fortunately all these 
statistics can be computed using relatively small N (in the order of a few 
thousand samples is often sufficient) so a suitable compromise is possible. 

When tracking is specific to the detection of epileptic events (y[n] = x.[n]) 
the mean of the raw EEG signal z.[n] is not so useful because these con- 
verge to DC offsets that do not change fast enough to characterize epileptic 
events. Energy and power can be better discriminators. Because DC offsets 
affect estimates of variance, COV and total variation x,[n] is detrended (the 
mean is removed) prior to computation. In any case the mean of a signal is 
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meaningless because it depends on the referencing system used. Voltage is a 
relative quantity defined only up to a constant (see Chapter 2). 

Examples of mean signal energy, power, variance, COV and total variation 
for EEG are explored in Figure 3.6, where each is computed for a single EEG 
channel x,[n] using different window sizes (N = 512 and N = 5120, corre- 
sponding to 1 and 10 seconds of a signal sampled at 512Hz). Calculations 
are done once a second by incrementing k by 512 samples at each iteration. 
These estimates are computed from the same seizure shown in Figure 1.10 
that occurs at approximately 620 seconds. Notice that short time windows 
give better representation of short events but the resulting estimates are not 
as smooth. All methods show some degree of change during the seizure, but 
the type of change is different for each method. The features in this figure 
are powerful in describing average trends but are not enough to discrimi- 
nate between non-seizure, pre-seizure and seizure activity when a larger EEG 
database is studied. 

Prior to analysis the EEG sequence is normalized to unit variance. Now 
that both mean and variance have been introduced we can formally define a 
detrended normalized signal (described in Section 3.2) as 
= yin] m PH (3.13) 

Oy 


Ynormalized [n] 


where uy and oy are the mean and standard deviation of the entire signal 
y[n], before windowing. When tracking changes it is also possible and some- 
times necessary to apply this normalization only to the (shorter) analysis 
window. 

Both numerator and denominator in this equation have the same units and 
thus the resulting ynormatizea[n] is scale invariant. This normalization should 
be applied to any signal prior to analysis. This has been done for the examples 
presented in Figure 3.6, and is always implied in the work that follows. 

It should also be noted that even after normalization statistics like mean 
energy, mean power and variance only make sense as relative measures. For 
example we can say that the variance in Figure 3.6(c) increases during the 
seizure relative to the non-seizure EEG, but its raw value does not matter. In 
contrast the total variation statistic is truly scale invariant because its ranges 
are well defined and changing the scale of y[n] does not change the estimated 
statistic. 


3.3.1.2 Periodicity (Auto-Correlation) 


The auto-correlation function gives an idea of how much a signal repeats 
itself, and thus can be used to identify regularity. For a real signal y[n] the 
auto-correlation function is defined as 


k+N 
CORR, fr, k] = — yin + 7]y[n, 0€ T<N. (3.14) 
: N 
n=k+1 
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FIGURE 3.6: Computations of (a) mean energy (|y[n]|), (b) mean power 
(Iy[n]|2), (c) variance, (d) coefficient of variation (COV) and (e) total varia- 
tion of EEG channel amplitude |y[n]| = |x.[n]|. In all cases the entire data 
sequence was normalized as in Equation 3.13 prior to windowing. Track- 
ing was performed using different length windows. The signal is sampled at 
F; = 512Hz and thus 1 second windows correspond to N = 512 samples and 
10 second windows have N — 5120 samples. All methods show that a degree of 
change is observed when the seizure occurs between 620-750 seconds, although 
this change is not necessarily consistent and these time domain methods may 
not by themselves be used to recognize seizures. In all cases, a longer window 
is shown to give smoother estimates, less susceptible to transient peaks and 
troughs in the data. (Continued) 
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FIGURE 3.6: (Continued) 


Assuming stationarity in y[n] over the analysis window, this function gives 
high values at delays 7 for which y[n] is most like itself, and values close to 
zero for delays for which y[n] is least like itself. The maximum value occurs 
at T = 0 because at zero delay y[n] exactly equals y[n + 7]. In fact for a 
zero mean signal CORR, [0, k] is, for large N, equivalent to the variance co, [k] 
defined in Equation 3.8. 

Equation 3.14 is defined for —N < 7 < N, but because it is symmetric for 
stationary signals it is sufficient to look at values for 0 € 7 < N. For non- 
stationary y[n] Equation 3.14 is not symmetric and the values of CORR, |r, k] 
for T > 0 may be larger than CORR, [0, k]. When it is not known whether y[n] 
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is stationary or not, this can be used as an indicator. In practice as discussed 
earlier stationarity is implicitly forced onto y[n] by windowing the signal with 
a small enough N. 

Equation 3.14 is appropriate to identify regular and oscillatory behavior 
because these signals repeat themselves often, that is, CORR, [r, k] is maxi- 
mized for values of r for which 


y[n +7] ~ v[n]. (3.15) 


Periodicity of the signal y[n] can be inferred from periodicity in the corre- 
lation function. 

To visualize the effects of the auto-correlation observe the example de- 
scribed in Figure 3.7 using the two signals shown in Figure 3.9(a): y[n] is 
random white noise and y[n] is a 10Hz sinusoid. Figure 3.7(a) shows their 
auto-correlations. For the random noise CORR, |r, k] is very small for all 
delays except 7 = 0, when the auto-correlation is 1. The sinusoid, however, 
shows maxima every 0.1 seconds because the y[n] repeats itself at 10Hz. There 
are also minima every 0.1 seconds, corresponding to delays at which the si- 
nusoid is a negative copy of itself. When the 10Hz sinusoid is corrupted with 
random phase noise, as in (b), the peaks and troughs of CORR, |r, k] are pre- 
served, albeit at a smaller magnitude and with more fluctuations. This is so 
even though the levels of noise are quite high — auto-correlation is a fairly 
robust measure. 

In practice y[n] is finite and Equation 3.14 is a biased estimate. For ex- 
ample, notice in Figure 3.7(a) and (b) that the auto-correlation at multiples 
of T = 0.1 seconds decreases with increasing 7, even though the sinusoid is an 
exact replica of itself and the auto-correlation should yield the same number. 
It occurs because with higher 7 fewer samples are used and the dividing 1/N 
term biases values at larger T. An un-biased estimate is achieved by modifying 
Equation 3.14 to 


k+N 
CORR, |r, k F uc y[n 4 7]y[n], 0€ 7 <N, (3.16) 


thereby accounting for the fewer samples used for higher 7. The effects of 
biased versus un-biased estimation are shown in Figure 3.7(c). Equation 3.16 
is used from now on. 

Finally, the comparison of the auto-correlations between functions of dif- 
ferent magnitudes is not possible because the values of CORR, |r, k] are not 
scaled to compatible ranges. Measures such as auto-correlation should not 
depend on magnitude because the amount of information contained in y[n] 
is exactly the same as ay|n], for any constant a € R. A normalized auto- 
correlation in which the values always lie between [—1, 1] can be achieved by 
dividing CORR, |[7, k] by its power, CORR, [0, k], so that, assuming station- 
arity in y[n], 
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FIGURE 3.7: The computation of auto-correlation. (a) shows a biased esti- 
mate for noise and sinusoid — the same signals shown in Figure 3.9(a). The 
auto-correlation gives high values for several 7 only if the signal is regular. 
(b) shows how the addition of noise in a regular 10Hz signal can decrease 
performance. The auto-correlation still shows significant structure, albeit at 
smaller amplitudes, and is fairly resilient to noise. (c) shows an unbiased es- 
timate of the same signals as in (a). (d) shows the same unbiased estimate 
as in (c), but this time normalized by signal power so that values always lie 
in the range [-1,1]. All signals here are sampled at F, = 512Hz, and so, for 
example, 7 — 0.125 seconds refers to a delay of 7 — 512 x 0.125 — 64 samples. 
(Continued) 
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(d) Normalized unbiased auto-correlation 


FIGURE 3.7: (Continued) 


CORR, |r, k] 


CORR, normalized]; k] = CORR. (0 k] . 
ylVs 


(3.17) 


Here the maximum value of 1 occurs at 7 — 0 independent of the magnitude 
of the signal. Normalized auto-correlation for the test signals is shown in 
Figure 3.7(d). Note also that this normalization is not required if the original 
signal y[n] is normalized to unit variance over the analysis window, as defined 
by Equation 3.13, prior to analysis. 

During a seizure there is often an observed increase in regularity in the 
EEG where the signal becomes more oscillatory. Thus when y[n] = z.[n] and 
we are interested in detecting seizures Equation 3.17 seems like an appropriate 
feature. The problem is that this estimate requires stationarity that cannot 
be assumed even within a seizure because its frequency evolves over time. The 
auto-correlation for non-stationary signals is not very robust — see for example 
Figure 3.8 where a chirp, a signal whose frequency evolves over time [163] (in 
this case from 10Hz to 25Hz), is shown in (a). Although this is a very regular 
signal its auto-correlation in (b) fails to yield high values because the times 
at which the signal is most like itself are not fixed (i.e., there is no single 7 at 
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FIGURE 3.8: Effects of auto-correlation for time-varying signals. (a) shows a 
chirp — a sinusoid whose frequency evolves, in this case between 10-25Hz, 
over the course of time. Even though the signal is quite regular, auto- 
correlation fails to give high values for any 7 other than 7 — 0, and as such 
auto-correlation is not suitable for non-stationary signals. Nevertheless auto- 
correlation may be used for seizure detection, as shown in (c), where the 
number of peaks and troughs observed in the auto-correlation in an EEG 
channel is computed over time. The assumption is that fewer peaks occur for 
repetitive signals. It is obvious that a change occurs when the seizure begins. 
All signals here are sampled at F, = 512Hz. Tracking in (c) is done using 2 
second (N = 1024 samples) non-overlapping analysis windows. 
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which this happens). Even the highly noisy example shown in Figure 3.7(b) 
yields larger auto-correlation values than does the highly regular chirp. 

Shorter windows in which stationarity is more likely is unsuitable to detect 
lower frequency seizures because the periodicity is not obvious in a short 
amount of time. The applicability of auto-correlation as a detector is limited. 
Nevertheless in certain cases auto-correlation can be used to characterize a 
seizure, as shown in Figure 3.8(c), and this feature can be used as part of a 
feature set to detect seizures. 


3.3.1.3 Synchronization 


Qualitatively synchronization describes the amount of locking between signals. 
A measure of synchronicity gives an idea of how similar signals are to each 
other. Several methods exist to measure different types of synchronicity. The 
first is the same as the auto-correlation described by Equations 3.16 and 3.17 
but applied to two different signals y1[n] and y2[n]. An un-biased estimate of 
their linear cross-correlation is 


k+N 


XCORR, ulr k] = —— — 9. nhun], 0<7<N. (318) 


The same issues as those described for auto-correlation apply and these 
are not repeated here. A normalized statistic suitable to compare signals of 
different amplitudes and lengths is given by 


XCORR[r, k] 


XCORR 2—normalize TE E 2 
yaya tisedlr, k] V/CORR,, [0, k] CORR, [0, k] 


(3.19) 


where the power of both yi[n] and y2[n] are now used. The idea is that 
large values result when two signals are lag synchronized, that is, when 


yiln + 7] ~ va [n]. (3.20) 


High values result at time lags 7 for which yi[n] and y2[n] have a similar 
course in time. Two signals are synchronized if the time lag 7 for which high 
values of Equation 3.19 occur is similar for all time. 

'The cross-correlation computed by Equation 3.19 is linear in that it as- 
sumes that similarity between two channels is given by a linear combination 
of yi [n] and ys [n]. Non-linear methods exist in which correlation calculations 
are maximized for times at which 


ui[n + 7] = F(yaln]); (3.21) 


where F(-) can be a non-linear transformation. There are ways to esti- 
mate F(-) but these are computationally expensive, require large amounts of 
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FIGURE 3.9: Synchronization calculations. (a) and (b) show test cases: noisy 
signal, 10Hz sinusoid, out of phase 10Hz sinusoid and a 20Hz sinusoid. (c) 
and (d) show the mean phase coherence (Equation 3.24) and cross-correlation 
(Equation 3.19) estimates for the test signals relative to a 10Hz sinusoid. 
Notice that the mean phase coherence can pick out both out of phase and 
different frequency synchronization, whilst cross correlation cannot be used 
when signals are of different frequency. Values are low in both cases for the 
noisy case, as expected. (e) shows the sensitivity of mean phase coherence 
when noise is introduced to signals known to be synchronized. Neither case is 
identified. Finally (f) shows how cross correlation between two channels may 
be used for the detection of epileptic seizures. The measure counts the number 
of peaks and troughs observed in the cross correlation between two EEG 
channels known to display synchronicity during seizure, under the assumption 
that fewer peaks occur for synchronized signals. There is an obvious change at 
the beginning of a seizure. All estimates are normalized to signal energy and 
signals are sampled at Fs = 512Hz. In (f) we use 2 second non-overlapping 
windows (N = 2 x F, = 1024 samples). (Continued) 
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117 


118 Epileptic Seizures and the EEG 


data and do not intuitively add to the material presented here. For more 
information, see, as an example, the work done in [28]. 

Another way that two signals can be synchronized is if they repeat them- 
selves at regular intervals, but do not necessarily evolve at the same frequency. 
This is called phase locking, an example of which is shown in Figure 3.9(b). 
Phase locked signals are said to be T, : 75 synchronized, where T, and T5 are 
real numbers that represent the ratio between the fundamental frequencies of 
yi[n] and yaln]. 

'This type of synchronization can be measured using instantaneous phase, 
where an arbitrary signal y[n] is decomposed into its instantaneous phase 6, [n] 
and amplitude A, [n] as [28, 113] 


ln, k] = yin] + ig[n] = Ayl, k-c1«n kN. (3.22) 


gin] is the Hilbert transform of y[n], and it ensures that [n] has both a 
real and imaginary component^ so that the phase 


0,[n,k] = arg (g[n,k]), K+ 1<n<k+N (3.23) 


exists. Here arg(x) indicates the angular component?of a complex num- 
ber z. See, for example, [138] for the calculation of the Hilbert transform. 
Note that this is only one method for determining phase information, and 
others such as wavelet transforms and correlation coefficients can be used as 
a substitute. The choice of method does not impact phase estimation results 
perceptibly [28, 112]. 

Two signals y; and yz are said to be approximately Tı : Tə synchronized 
if Ay, ya [n, k] = T106,, [n, k] — T204, [n, k] remains bounded for all n. This phase 
locking (or synchronization) can be quantified by taking the mean phase co- 
herence 


1 AN 
l y^ gears 
Bi 


^Yyi 2 n. To, k] = 


kN 
| ; (3.24) 


which is an average of all the instantaneous phases over time. Yy; ya [T1, T2, k] 
has a maximum value of 1 when both signals are phase locked, that is, 
Oy, yn, k] = 0 for all n, and a minimum value of 0 when no phase lock- 
ing occurs. Since most real signals are noisy it is unlikely that Equation 3.24 
is ever either exactly one or exactly zero. It is expected that low values are 


4This is called making a signal analytic, and is in some cases useful in the analysis and 
manipulation of signals. 

5The angular component or argument of a signal is calculated as arg(r) = 
Re(z) 
is an integer used to ensure that phase differences greater than 27 can be computed in 
Equation 3.24. 


arctan ( ) + 2mm. Because the arctan() function yields values between [0,27), m 
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achieved for signals that are not phase locked in general and high values for 
those that are. 

In contrast to lag synchronization that uses both the phase and amplitude 
information of y; [n] and y2[n], Equation 3.24 only uses the phase information 
and therefore no amplitude normalization is necessary. 

The effects of synchronization are explored in Figure 3.9. Test signals, 
including random noise, phase locked signals of the same frequency and phase 
locked signals of different frequencies can be seen in (a) and (b). Ti : To 
and lag synchronization are presented in (c) and (d) respectively. As ex- 
pected, both types of synchronization do not respond to random noise. But 
whereas lag synchronization gives high values for phase locked signals of differ- 
ent frequencies (identifying the factor of two between frequencies), using the 
cross-correlation does not. This implies that mean phase coherence is a more 
appropriate choice for measuring synchrony in systems in which two signals 
are in phase but not behaving similarly. 

The effects of noise can be devastating on estimates of ^, ,, [T1, T5, k], 
as shown in Figure 3.9(e), whereas cross-correlation (like auto-correlation) is 
fairly robust to it. This trade-off, the computational expense incurred by 
not knowing Tı and 75, and the long stationary windows of data required to 
obtain stable estimates make mean phase unsuitable for many tasks. 

Synchronization has been described in Chapter 1 as a key observable during 
many types of seizures, and as such is a suitable feature to extract for the 
detection of epilepsy. Figure 3.9(f) shows how lag synchronization may be 
used to detect epileptic events using EEG data, where the temporal evolution 
between two channels, selectively chosen to produce good results, is seen to 
identify a seizure. 


Statistics discussed in this section describe time domain features 

of the epileptic EEG. 
e Amplitude: Energy |y[n]| and power |y[n]? of a signal y[n]. Mean 
values of these can be computed as in Equation 3.7. 


e Variability and Regularity: ^ Variance o,[k], coefficient of variation 
cy|k] and total variation vy[k] all give an idea of how variable y[n] 
is by quantifying how much it deviates from its mean or across its 
range. Periodicity can be deduced from the autocorrelation function 
CORR, normatizea|T, k] defined in Equation 3.17. 


e Synchronicity: How much two signals y1[n] and ys[n] are like each 
other can be estimated through Equation 3.18, where the cross corre- 
lation XCORR,, y, normatizea|T, k] is defined, or through mean phase 
coherence ^; ya [71, T2, k] found in Equation 3.24. 


To remove dependence on mean and scale these statistics 
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should be applied to a signal y[n] that has been normalized as 
in Equation 3.13 over the analysis window. 


By themselves these statistics are insufficient to reliably de- 
tect seizures. 


3.3.2 Frequency Domain Analysis 


Frequency is a measure of how often an event occurs in a unit period of 
time. If a ball is bouncing once a second, then it is hitting the ground at a 
frequency of 1Hz, twice a second is 2Hz and so on. Many of the time domain 
methods presented in the previous section implicitly include the notion of 
frequency. For example, the auto-correlation detects repeated events at a 
particular frequency or period, just as the cross-correlation detects similarity 
in the frequencies between two signals. 

Signals such as the EEG contain events that occur at different frequencies. 
What these are is not always obvious in the time domain because events at 
many frequencies interfere with one another. To make the frequency content 
more clear a transformation is applied to a signal y[n] so that it is defined 
in terms of frequency w rather than time n. For example suppose that a 
signal y[n] is the sum of two sinusoids at 10 and 20Hz respectively, as shown 
in Figure 3.10(a) for both a noise-free and a noisy case. Here it is not so 
obvious what these frequencies are, particularly in the noisy case. However 
when described in terms of w it is clear, as in Figure 3.10(b), that there are 
10 and 20Hz components. Even fairly high noise levels do not significantly 
interfere with this observation. 

A signal can always be constructed as a linear combination of basis func- 
tions. In the time domain these basis functions are implicitly an identity 
function that isolate elements in time. That is, the basis function is defined 
as 


1, if k=n 


bs [E] = { 0, otherwise i90) 


'The original time-domain signal can then always be reconstructed as 


oo 


yn] = M vikls[k, n=1,2,3---. (3.26) 


k——oo 


In the frequency domain the basis functions isolate the different frequency 
components of the signal y[n] by projecting it onto sinusoidal basis functions. 
Sinusoids are chosen because they are very good at isolating components of 
different frequencies. The signal is then said to be described in terms of 
its frequency components and is defined in the frequency domain. Estimating 
features that depend on frequency is known as frequency domain analysis. The 
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FIGURE 3.10: The power of using the FFT, as shown in (a) and (b), is its 
ability to reveal the frequency content in signals even in a noisy environment. 
The test signals are in (a) and the corresponding PSDs in (b). A sampling 
frequency of F; = 512Hz is used (1 second equals N = 512 samples); thus the 
spectrum in (b) is defined between 0-256Hz, even though this figure shows 
only frequencies up to 50Hz. All PSDs are normalized to total PSD energy. 


transformation of a time domain signal to the frequency domain is known as 
the Fourier transform, named so in honor of the French mathematician and 
physicist Jean Baptiste Joseph Fourier. 

For discrete-time finite time domain signals? the fast Fourier transform 
(FFT) of an arbitrary windowed signal y[n] for n =k+1,k+2---k+N is 
given by 


N 
2 
FETWo.k] = y phe w= E, m=0,1-N—1. Gan 


The sinusoidal basis functions e™*™” = cos(wn) — isin(wn) are able to 


6The interested reader can find definitions of the Fourier transform for other types of 
signal, including those defined in continuous time, in signal processing books such as [138]. 
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FIGURE 3.11: The effects of using different windows when computing the 
FFT. A test signal with 2 sinusoidal frequency components (10 and 20Hz) is 
used. When a rectangular window, defined in Equation 3.4 is applied to the 
time domain signal its PSD shows frequency content other than the 10 and 
20Hz. This effect is reduced when the Hanning window defined in Equation 
3.5 is used. Test signals are sampled at F, = 512Hz; thus the PSD is defined 
for frequencies up to 256Hz, even though only the 1-100Hz range is shown. 


isolate activity at different frequencies w, measured in radians. The value of 
the FFT at each w represents the relative contribution of events that occur 
at that frequency to y[n]. The FFT is defined for 0 < w = 72" < 2r with 
m = 0,1..N — 1. That is, the range 0 to 27 is divided in equidistant segments 
dependent on the number of samples in the windowed y[n]. To scale to correct 
frequency range in Hz, a conversion of w = 27 f /F, is necessary, where F, is 
the sampling rate of the data and f is the frequency in Hz between zero and 
Fy. 

Equation 3.27 implicitly uses a rectangular window to define the analysis 
data. However the rectangular edges of the window can affect the accuracy 
of the computed statistic by causing the estimated content at different fre- 
quencies to interfere with one another. This is known as spectral leakage. To 
manage this effect the windowed signal y[n] is often first multiplied by a non- 
rectangular function such as the Hanning window shown in Figure 3.5(b). 
Although the Hanning window does not remove the spectral leakage it re- 
distributes it to nearby frequencies only. Thus better estimates are obtained 
at the expense of lower frequency resolution. An example is shown in Figure 
3.11. 

The transformation to the frequency domain preserves all the information 
contained in y[n]. The original signal over the window k+1 <n < k+N can 
be unambiguously re-constructed using the inverse Fourier transform 


21(N—1)/N 


1 iwn 
yn k= So. FFTs, k], «4-12, V. (3.28) 


w=0 
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FIGURE 3.12: The effects of aliasing are shown. A test signal with 10 and 
15Hz components, but sampled at F; = 17Hz (1 second equals N = 17 sam- 
ples) is used. Because the Nyquist criterion is not observed the frequency 
content of the signal is not reflected in the normalized PSD. F; = 17Hz and 
the PSD is defined for frequencies up to 8.5Hz. 


When computing Equation 3.27 the longer y[n] is, the finer the resolution 
of the FFT. For example, if F, = 512Hz and N = 512 samples are used, then 
the resulting frequency resolution is 1Hz because there are 512 values evenly 
distributed between 0 and 512Hz. If N — 1024 samples, then the resolution 
is 0.5Hz. 

The second half of FFT|w, k] can be discarded because frequencies beyond 
the Nyquist frequency F’,/2 are irrelevant, as explained in Section 3.2. If fre- 
quencies between F,/2 < f < F, are not removed from y[n] prior to sampling 
then aliasing (a form of computational noise) occurs, the effects of which are 
shown in Figure 3.12. This is measurement error that cannot be removed after 
sampling. 

A related statistic is how much power each single frequency component 
contributes to the overall signal y[n]. This power spectral density (PSD) is 
defined as a function of w as 


PSD[w, k] = |FFT[w, k]|^ , (3.29) 


where |: | denotes the absolute value. The notion of power is preserved be- 
tween time and frequency domain: CORR, [0, k] in Equation 3.14 is equivalent 
to PSD[w, k]. 

Equation 3.29 is not normalized and the comparison between signals of 
different amplitude and power can be misleading. A relative statistic can be 
derived by normalizing to the total signal power, that is divide Equation 3.29 
by 


Normalizing Factor = 5 PSD[w, k]. (3.30) 


wW 


With this type of normalization the PSD does not show how much power 
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FIGURE 3.13: Effects of averaging to correctly represent the frequency con- 
tent of noisy signals. One average (P — 1 in Equation 3.31) does not show 
the 10, 20 and 30Hz components. Using P — 10 does. The test signals are 
sampled at F; = 512Hz and thus the PSD is defined up to 256Hz. The above 
plot only shows frequencies between 0-100Hz. The averaged PSD is then 
normalized using normalizing factor in Equation 3.30. 


is in the signal, only the relative contribution of each frequency component. 
If it is important to compare the power of two signals then, so long as the two 
have been correctly normalized to the same scale, a complementary measure 
such as those presented in Section 3.3.1 should be used. 

The statistical fluctuations in time domain signals make a single instance of 
a computed PSD, normalized or otherwise, unrepresentative of the frequency 
content of y[n]. Looking at averages over many trials is again more indicative 
of a general trend. If the N samples in the analysis window of y[n] are divided 
into P segments of equal length, then an average PSD can be computed as 


P-1 
1 
PSD[w, k] = 5 M FFT pw, Il’, (3.31) 
p=0 


where FFT,|[w,k] is computed by Equation 3.27 applied to each N/P 
length segment. To understand the importance of this averaging consider 
a sinusoid mixture of 10, 20 and 30Hz embedded in a significant level of noise. 
The process of computing the PSD for P = 1 and P = 10 with N = PF, is 
shown in Figure 3.13. Using P = 1 provides a PSD estimate that has larger 
frequency resolution but that cannot isolate the important frequencies because 
the estimate is too noisy. With P = 10 the dominant frequencies 10, 20 and 
30Hz show peaks well above background noise — this noise is averaged out 
— at the expense of reduced frequency resolution. The trade-off for a signal 
with N samples is then between better resolution (i.e., P is smaller) versus 
better characterization through a greater number of averages (P is larger). 
Too small a P and the PSD may look noisy. Too large a P and not enough 
frequencies are resolved. 

A final problem is once again that of stationarity. The PSD computed from 
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FIGURE 3.14: The PSD of non-stationary signals. (a) shows the PSD of a 
chirp, described in Figure 3.8(a). It is obvious that the frequency content 
is between 10-25Hz, but it is not possible to say at what times this content 
occurs. (b) is a spectrogram that can localize both frequency and temporal 
aspects of the chirp. Spectrograms compute the PSD of short-time windows. 
(c) shows the spectrogram of an EEG sequence. A seizure can be identified 
at 620 seconds (between the dashed lines). Tracking in (b) and (c) is done 
using 0.5 second (N = 256 samples with sampling frequency F, = 512Hz) 
non-overlapping analysis windows. The PSD at each 0.5 second window is 
defined for frequencies up to 128Hz, of which only the 0-40Hz range is shown. 
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a signal y[n] that is non-stationary over the analysis window n = k + 1, k + 
2---k+N is only able to generally reflect its frequency content. Consider again 
the chirp signal shown in Figure 3.8(a). This signal's frequency content evolves 
from 10Hz to 25Hz over an analysis window, assumed to be 20 seconds for the 
computations here (N = 10240 samples when F, = 512Hz). Its normalized 
PSD is shown in Figure 3.14(a), where it is evident that there is activity over 
the entire 10-25Hz range. 

However the PSD is by design incapable of isolating the time at which 
these frequencies occur. One way to see this is by again inspecting the basis 
functions used to decompose y[n]. The time domain basis function in Equation 
3.25 can define y[n] for every single sample, regardless of the sampling rate. 
The sinusoidal basis functions used in the FFT computation require many 
more data samples to identify frequency content. Thus frequency resolution 
is gained at the expense of temporal resolution, and changes in frequency over 
the analysis window are always lumped together. 

To track the temporal evolution of the frequency content we can instead 
draw a spectrogram. This is the sequential computation of the PSD of a signal 
for smaller analysis windows over time. In Figure 3.14(b) the analysis win- 
dow used is now 0.5 seconds (N = 256 samples) long, using non-overlapping 
segments. In this figure the evolution from 10 to 25Hz, and the time at which 
these changes happen, is visible. The windows must be long enough to allow 
appropriate computation of Equation 3.31 (with sufficient averaging and high 
enough frequency resolution), yet short enough to correctly characterize the 
evolution of a non-stationary signal. 

PSD analysis is an important tool used to understand the static and dy- 
namic properties of the EEG, where static properties refer to locally stationary 
behavior, and dynamic properties aim to capture the time-evolving nature of 
the EEG. Figure 3.14 describes dynamic properties, and an example in which 
significant differences exist between the spectrogram of normal and seizure 
EEG is shown in (c). 

The static properties of the EEG are highlighted in Figure 3.15, where the 
PSDs of typical EEG sequences described in Chapter 1 are computed. All 
data used are sampled at F, = 512Hz and PSDs are computed on detrended 
data to which a Hanning window was applied prior to analysis. PSDs were 
computed to a resolution of 1Hz, meaning that N = 512 samples (1 second) 
were used for each computation. Given this, as many averages were taken as 
there was data available. For example, in (a) 30 seconds of data were used, 
thus there were P = 30 averages used to compute the PSD. In all cases at 
least 10 seconds of data were available and thus P > 10 averages in Equation 
3.31. Important observations from this figure include: 


e PSDs of typical EEG sequences: In Figure 3.15(a-d) the differences 
in PSD estimates between normal awake, normal asleep, pathological 
and artifactual activity reflected in common scalp recorded EEG are 
shown. Plots are computed with data taken from the same patient, 
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FIGURE 3.15: Example PSDs for EEG sequences. (a-f) show examples of the 
PSD of typical EEG sequences. Notice that even in normal states the EEG can 
look significantly different in (a), the lower frequency peaks of sleep EEG in 
(b), and the difficulty in differentiating epileptic EEG from normal sequences 
in (c). Artifact is easier to identify, particularly using the higher frequency 
range, as shown in (d). In (e) bipolar referencing is shown to de-emphasize 
low frequencies whilst emphasizing high frequencies. Finally the higher energy 
intra-cranial signals and low pass filtering resulting from spatial averaging of 
the skull and scalp are shown in (f) with simultaneous intra-cranial and scalp 
EEG records. All PSDs except in (f) are normalized to total PSD energy. All 
PSDs are computed on 1 second segments (N = 512 samples with F, = 512Hz) 
for as much data as are available. An average of all segments is performed 
and the resulting PSD is normalized. The PSD is defined for frequencies up 
to 256Hz in all cases, even when a smaller range is shown. (Continued) 
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FIGURE 3.15: (Continued) 
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unless otherwise stated. The normal and sleep data used to compute 
the PSD of (a) were as in Figure 1.9 and epileptic data in (c) are taken 
from Figure 1.10. 


(a) shows that the EEG is non-stationary because even the normal awake 
state in the same patient can vary significantly. One trace shows the 
pronounced alpha peak observed during the alpha rhythm, described in 
Chapter 1, while two other traces that seem very different in the time 
domain (shown in Figure 1.9) have almost identical PSDs once they 
have been normalized. The observed differences occur largely at higher 
frequencies, in itself an indication of non-stationarities perhaps driven 
by varying inputs. Selecting a subset of channels can emphasize certain 
features of the EEG. For example it is the occipital electrodes that are 
most affected by the alpha rhythm. When the PSDs of this subset of 
electrodes is calculated a 10Hz peak is more pronounced. 


(b) shows that the frequency content in the sleep EEG varies from the 
normal awake EEG. Sleep EEG is known to contain slower and larger 
amplitudes than awake EEG. Although the amplitude difference can- 
not be seen here because the PSD energies have been normalized by 
Equation 3.30, in general low frequencies contribute more than high fre- 
quencies during sleep EEG. The alpha peak is not at all present during 
sleep, although different peaks due to spindles and REM may appear at 
times. 


(c) shows that there are differences between epileptic and normal EEG. 
A segment from two different seizures, one with fundamental frequency 
4Hz and another one with peaks at 1 and 15Hz, is shown. The latter 
can be hard to distinguish from normal EEG using PSD alone, hence 
the difficulties involved in seizure detection. Other measures such as 
synchronization, periodicity and energy must supplement frequency es- 
timates. 


Finally, (d) gives an idea of how common artifact may be differenti- 
ated from normal EEG. In general artifact that is muscular in nature 
(EMG or chewing) involves higher contributions from frequencies above 
roughly 40Hz, although they look similar to normal activity in the 1- 
40Hz range. EMG is not always separable from brain activity. Other 
common artifacts such as EOG (ocular/blinking artifact) have relatively 
higher contributions from frequencies « 2Hz and less contribution from 
frequencies between 10-25Hz. Shown also in one of these traces is a very 
large peak at 50Hz and its harmonics (100Hz,150Hz,200Hz...). These 
exist because of interference from electrical equipment that is powered 
by a 50Hz AC supply’. If analysis overlaps the 50Hz frequency range, 
a notch filter that removes this contribution should be applied to the 
data. 


"'This peak occurs at 60Hz in some countries, including the USA and Japan. 
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As an aside it is important to notice that there is an impressive difference 
in the way that the time and frequency domain describe the information 
in a signal, and yet because it is an invertible operation both time and 
frequency domain contain the same information. 


Effects of referencing system: Figure 3.15(e) shows that the referenc- 
ing system used can affect the PSD distributions. These effects have 
been described in detail in Chapter 2; here we show that experiment 
coincides with theory. In Figure 2.17 it was predicted that bipolar refer- 
encing emphasizes high frequencies and attenuates low frequencies. This 
is supported in Figure 3.15(e). Also shown is the localizing nature of 
bipolar referencing that de-emphasizes the alpha peak because frontal 
electrodes are used. Some of these features are also observed when an av- 
erage reference (taken over all available electrodes) is used, although the 
effects are not as pronounced because the spatial sampling frequency is 
too low in 21-channel scalp recorded EEG. Furthermore, the alpha peak 
is emphasized rather than de-emphasized seeing as an average montage 
considers both frontal and occipital electrodes. 


Effects of skull: Figure 3.15(f) shows the effects that the skull has on 
EEG records when the PSDs are computed from simultaneous intra- 
cranial and scalp records during a normal awake state. These PSD 
estimates have not been normalized to show firstly the larger ampli- 
tude/energy of intra-cranial versus scalp EEG at low frequencies; second 
that high frequency content is roughly the same; and third that a low 
pass filter for frequencies higher than about 20Hz occurs between cortex 
and scalp. These effects were all predicted in Chapter 2. 


In the time domain a signal y[n] can be constructed as the linear 
combination of identity basis functions described by Equation 
3.25. In the frequency domain y|n] is expressed in terms of si- 
nusoidal basis functions capable of isolating activity at different 
frequencies, as in Equation 3.27. Time domain and frequency 
domain representations contain exactly the same information, 
but the features accentuated in each domain differ. 


In particular, examples are shown where the power spectral 
density (PSD) of a time evolving signal can differentiate be- 
tween many typical states of EEG. The PSD is a statistic that 
estimates the amount of signal power caused by events of each 
frequency. It is computed as in Equation 3.29. Comparing the 
relative energies between two signals is possible when the nor- 
malizing term in Equation 3.30 is used. 
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3.3.3 Time-Frequency Analysis 


'The time domain statistics in Section 3.3.1 fail to provide sufficient frequency 
content information, an aspect crucial to EEG classification, and the frequency 
domain techniques Section 3.3.2 designed for stationary processes can provide 
temporal information only by windowing y[n]. The difficulty is in selecting 
appropriate window sizes so that results are optimal and stationarity assump- 
tions are not violated. In this section statistics designed to resolve both tem- 
poral and frequency content for non-stationary signals are presented. Several 
approaches exist, including Gabor atoms and Wigner-Ville distributions, but 
by far the one that has received the most attention in EEG is wavelet analysis. 
A wavelet [n] is a function that [143] 


1. Integrates to zero: 35^... w[n] = 0, and 


T;——oo 


2. Has finite power: P .. |v[n] < oo. 

By themselves these properties do not make wavelets very special. With 
appropriate use, however, wavelets can be used as basis functions to provide a 
combination of both temporal and frequency information. Observe that time 
domain (identity) basis functions in Equation 3.25 violate the first condition, 
whilst frequency domain (sinusoidal) basis functions in Equation 3.27 violate 
the second. 

Some definitions are needed before this can be understood. Let pasin] 
represent a wavelet function that is shifted (or translated) by b samples and 
scaled by a, that is, 


1 ,n—b 
Vay[n] = ma ae (3.32) 

When a = 1 and b = 0 then 7% 9[n] is known as the mother wavelet. With 
0 «a «1 the mother wavelet contracts in time, and when a > 1 then Was[n] 
stretches in time. The larger the a the longer the wavelet function is. Each 
value of a represents different scales at which temporal and frequency content 
can be extracted with different resolutions. 

It is this scaling property that makes wavelets so powerful. A wavelet that 
is more compact in time can obtain new information in fewer samples than 
a wavelet that is temporally longer. Therefore smaller values of a give finer 
or more detailed temporal information than larger values of a. This is known 
as the temporal scaling. Temporal scaling for the Daubechies-4 (D4) wavelet 
proposed by Ingrid Daubechies in 1987 [143] is presented in Figure 3.16(a). 
Scales a = 27, 23..26, all with b = 0, are shown. 

Now consider the frequency content of wavelets for different values of a. 
This is shown in Figure 3.16(b), where the PSD as computed by Equation 
3.29 is applied to the wavelet at each scale. It is shown that different scales 
occupy different frequency regions. Larger values of a occupy lower regions 
of the frequency spectrum, and smaller values of a occupy higher frequencies. 
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(c) Time-frequency resolution of wavelets 


FIGURE 3.16: Wavelet Fundamentals. (a) and (b) respectively show the 
time and frequency scaling properties of the Daubechies-4 (D4) family for 
scales m = 2 — 6 and b = 0. Smaller scales give better temporal resolution 
but less frequency resolution, whereas larger scales do the opposite. In the 
frequency domain, larger scales occupy lower frequency regions, hence their 
ability to isolate behavior at different frequencies. The wavelets in (a) have 
been normalized to uniform amplitude. (c) shows the uncertainty principle of 
time-frequency analysis. Resolution can be high for either time or frequency, 
but not both. Intermediate scales show intermediate resolutions for both. 
Signals in this figure are sampled at F, = 512Hz, thus 1 second is N = 512 
samples. 
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Furthermore the bandwidth of each wavelet, that is, the range of frequency 
spectrum most relevant for each wavelet, becomes smaller for larger a — finer 
frequency details can be distinguished with larger values of a. This is known 
as the frequency scaling of wavelets. The 1/,/a in Equation 3.32 ensures that 
the magnitude of the FFT at all scales is uniform. 


Wap(n], described by Equation 3.32, is a function known as a 
wavelet that for small values of a has fine temporal resolution 
and coarse frequency resolution. Fine frequency resolution but 
coarse temporal resolution can be obtained with larger values 
of a. Resolution can be high for either temporal or frequency 
information, but not both. This is known as the uncertainty 
principle of time-frequency analysis, shown graphically in Figure 
3.16(c). Depending on the application a suitable compromise 
must be reached with intermediate values of a. 


Temporal evolution using wavelets can be obtained by shift- 
ing the wavelets with different values of b. This allows them to 
characterize the non-stationary nature of signals. 


So far discussion has focused on a wavelet and its properties. To use 
wavelets to obtain temporal and frequency information from a signal y[n] we 
must introduce the wavelet transform Wn]. For each value of a and b it is 
defined over the analysis window as 


k+N 
Wasln,k] = M ylklvasln—7], k+1<n<k+N. (3.33) 
T=k+1 


When applied to y[n] the wavelet transform retains only the frequency 
content that is within the same bandwidth as that of Wap[n]. This operation 
is known as convolution and its selective attenuation of the frequencies in y[n] 
is known as filtering. More detailed information on these concepts including 
their broad applicability can be found in [138]. 

Orthogonal or non-redundant wavelets are those for which the frequency 
content at different scales do not overlap, so that when the wavelet transform 
is applied the resulting information is different at each scale. Orthogonal 
wavelets are achieved by dyadic sampling, that is, by restricting the dilations 
and translations of the mother wavelet to a = 2” and b = 2! with m and l 
both integers. If only orthogonal wavelets are allowed then a signal y[n] is 
unambiguously represented by its decomposition into the different bases, so 
that [65] 
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d[A,1] r[A41,1] 


FIGURE 3.17: Wavelet Decimation. The figure shows the decimation pro- 
cedure for wavelet decomposition into components shown in Equation 3.34. 
The coefficients d[m, I] and r[A + 1,1] are computed for each value of m by 
iteratively filtering the previous stage result using filters g[n] and h[n] derived 
from the coefficients in Table 3.1, and then down sampling the result by a 
factor of 2. 


Daubechies-4 (D4) Filter Coefficients 


0.1629 0.5055 0.4461 -0.0198 -0.1323 0.0218 0.0233 -0.0075 


TABLE 3.1: Example wavelet filter coefficients. 


oo A 
y= M (Samson) etae. (3.34) 


m-—-—oo m=1 


d[m,I] are known as the wavelet coefficients for scales a = 1,2,..24, and 
r[A + 1,1] is the remainder of the signal representative of all scales larger 
than a = 24. The collection of d[m, I] and r[A + 1,1] unambiguously describe 
y[n], and y[n] can be quickly and efficiently reconstructed exactly if all these 
coefficients are known [152, 151]. For the windowed time domain signals of 
length N the first summation reduces to k 4-1 « |l « k 4- N. The D4 wavelet 
family shown in Figure 3.16(a) is orthogonal and can be used for the above 
analysis. 

The values of d[m,l] and r[A + 1,1] can be iteratively computed using 
cascaded filters. The process is shown and explained in Figure 3.17. This 
is simpler and faster than applying Equation 3.33 directly because the entire 
process, including calculation of wavelets and filters at each scale, can be 
performed by storing a few coefficients that describe the wavelet family. For 
the D4 family these coefficients are listed in Table 3.1. The interested reader 
can find detailed explanations of why this system works in [102] and [143]. 
Research into faster decimation also exists (see, for example, [199]). 

Wavelet filters isolate spectral information in different frequency ranges 
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FIGURE 3.18: Example applications of wavelet filters. (a) shows the relative 
energy found in each frequency scale k = 2 — 7 for different states of EEG. 
The trends shown are the same as in Figure 3.15, demonstrating the power 
of wavelets to display similar information as PSD but more compactly. (b) 
shows wavelet decimation from Figure 3.16(d) applied to an EEG sequence. 
The seizure occurring at 620 seconds (between the dashed lines) is marked 
by an obvious change, signifying the applicability of wavelet theory to seizure 
recognition. Signals are sampled at F, = 512Hz, thus 1 second is N = 512 
samples. Tracking in (b) is performed using 2 second (N — 1024 samples) 
non-overlapping windows. 


and can describe the EEG in similar ways as the PSD. A histogram of the 
relative average power at scales m = 2 to m = 7 for typical EEG sequences 
is shown in Figure 3.18(a). The frequency ranges for each scale correspond 
to those in Figure 3.16(b). A normalized measure is once more obtained by 
dividing by the total energy at all scales. The trends shown in this figure are 
the same as those observed in Section 3.3.2, but the representation is more 
compact. An example of relative energy of the different wavelet scales used 
to track changes over time is shown in Figure 3.18(b). Clear differences exist 
during the seizure. 
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3.3.4 Non-Linear Analysis 


The statistics discussed so far are useful for extracting information from a 
signal y[n] assumed to be either (1) generated by a linear system with state 
z[n] or (2) generated by non-linear dynamics that can be approximated by 
a very large dimensional linear system. Many signals that are generated by 
non-linear systems are treated as linear simply because the analysis is simpler 
and the theory is more developed. However even simple non-linear systems 
can lead to very large dimensional linear approximations, and non-linear anal- 
ysis can be used to reveal information that may be missed. The hope is that 
by treating the system as non-linear a lower-dimensional (or simpler) repre- 
sentation of y[n] exists. 

Tools to analyze signals generated by a non-linear system with state z[n] 
are more useful now that fast computers are more readily available. Non- 
linear signal processing in the past few decades has developed significantly 
as a direct result of this advance in computer technology. Interest in the 
characterization of EEG from the perspective of non-linear statistics followed 
suit [70]: If the EEG signal is generated by a non-linear system [60], can the 
non-linearity be used to the advantage in classification tasks? 

'The problem is that non-linear statistics require significantly longer data 
sequences to be estimated correctly. This has to be traded against the hope 
that the dimension of the non-linear phase space is much smaller than the 
linear one. In an environment where a signal y[n] is riddled with properties 
such as non-stationarity and noise non-linear analysis is difficult. 

Nevertheless, non-linear dynamics are able to describe a much larger and 
richer range of behavior than linear ones. Attempting to understand an incred- 
ibly complex system such as the human brain requires analysis that reveals 
complex behavior, even if this analysis is limited. 


Wherever possible linear analysis should be used to extract fea- 
tures from EEG data. 'The linear signal processing tools avail- 
able are far more mature and better understood than non-linear 
ones. They generally require less data, making the assumed 
stationarity more likely. 


However, linear analysis has been tried and has largely failed 
to reveal aspects of EEG dynamics including the identification 
of a pre-seizure state. In these cases non-linear analysis may be 
useful, but results must be interpreted with care and limitations 
well understood. 


Assuming stationarity and the availability of sufficient data a challenge 
in estimating non-linear statistics is the presence of noise, discussed at the 
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beginning of Section 3.3. Noise can have considerable impact on the quality 
of computations. In some cases noise reduction techniques must be applied to 
the time series prior to computation [34]. 

This section presents some of the more commonly used methods in non- 
linear signal processing. These methods involve a two stage process in which 
the dynamics of a multidimensional z[n] must first be reconstructed from a sin- 
gle dimensional recording y[n]. This is known as embedding in the dynamical 
systems theory or observation theory in systems engineering. 

Once the properties of z[n] have been reconstructed its non-linear features 
can be extracted. Three major categories of features exist: those that give 
an idea of how complicated a system is, known as dimension N; those that 
give an idea of how predictable a system is, known as Lyapunov exponents Ai; 
and those that give an idea of how random the system is, known as entropy 
H. Each of these statistics are discussed next. Further detail that that pre- 
sented for both theoretical and practical aspects of this theory can be found 
in [2],[34],[82], [131], [159], [173] and [183]. 


3.3.4.1 Embedding Theory 


When the equations that describe the dynamics of the system z[n] are known 
then it is possible to compute its features analytically. Often these equations 
are not available and statistics must be estimated using experimentally col- 
lected data. This signal, y[n], is a single (or low) dimensional record that 
can be used to estimate statistics from a multidimensional system z[n]. The 
properties of z[n] must be reconstructed from y[n] to be able to capture the 
same dynamics of the underlying N dimensional system. 

A fundamental idea in non-linear analysis (Takens/Aeyels's theorem, [4, 
179]) is that this reconstruction can be performed by taking time delayed 
versions of y[n], that is [179], 


zin] = (y[n], yin * 7],--- yin + (å — 1)7]). (3.35) 


T is the time delay, typically chosen by minimum mutual information tests 
developed in [2], fi is the embedding dimension and z[n] is known as the phase 
portrait of y[n] that spans an f) dimensional phase space. This reconstruction 
can unambiguously capture the co-ordinate invariant dynamics of the original 
N-dimensional system (given noise free data) if the embedding dimension 
fi > 2N +1. In practice ù can usually be much smaller [2], because typically 
the system state does not explore all of the phase space but only a (much) 
lower dimensional subset of this state space. The presence of noise in y[n] 
limits but does not invalidate this method [179]. Note also that, as always, 
y[n] should be normalized as in Equation 3.13 prior to embedding. 

Formally, the reconstructed system £Z[n] is related to the original system 
z[n] as 


zin] = A(2[n]), (3.36) 
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where A is some invertible map relating the original system z[n] to the 
reconstructed system Z[n]. Equation 3.2 can be re-written as? 


zin 1] = P(z[n], [n], u[n], n) 
A(Z[n--1) =  PA(Z[n], x[n], u[n], n) 

£n-41]) = A !PA(£[n], [n], u[n], n) 

an+) = Pn] [n] uf], n), 


where P = A~!PA is the new map in embedded space replacing the orig- 
inal P. 

The reconstructed system z[n] does not look the same as the real z[n], 
but certain properties such as Lyapunov exponents, dimension and entropy 
(discussed later) remain the same because these do not depend on the co- 
ordinate system used. This reconstruction can and is used to estimate these 
parameters from a recorded signal. 

The choice of f; is not trivial. f) ~ 5 or 6 are typical values chosen for 
EEG analysis, usually justified by *good" results obtained for a particular 
application. These values are used because it is convenient for non-linear 
signal processing tools that z[n] be low dimensional (small Ñ). The dynamics 
of highly complex non-linear systems are unlikely to be captured by the EEG 
due to its low spatial resolution. Small N indicates that the EEG is capable 
of reproducing more of these dynamics. 

However, desiring low dimensionality does not make it so. Proving the 
existence of low dimensional non-linearities is not easy, as shown in [145]. 
Whilst many papers ([10], [94], [145]) have concluded that there is sufficient 
indication of non-linear deterministic structure (as opposed to stochasticity) 
in the EEG, there has not been much evidence to support low dimensionality. 
In fact in Chapter 6 an incredibly simplified model is presented that in itself 
contains approximately 15 dimensions. This is quite something given that a 
simple 3-dimensional model of a neuron would suggest a brain with dimension 
N 3x 10°, and hence embedding dimensions potentially in the order of 
6 x 10°! Tests for the presence of low dimensional dynamics on both intra- 
cranial and scalp EEG were performed in [10] and [26] for segments that were 
weakly stationary. The study showed that for intra-cranial EEG clear evidence 
of non-linearities exist, but that strong indications of low dimensionality exist 
only during an epileptic seizure. Scalp recordings indicated non-linearity but 
no evidence of underlying low dimensionality either in the seizure or non- 
seizure state. This suggests that either the dynamics captured at the scalp 
are of great complexity (more neurons contribute to the evoked potentials on 
the scalp) or that the characteristics of non-linearities are filtered by the skull 
and scalp and hidden by noise. 


8The map A relates systems of very different dimensions, and as such is not as simple 
as implied by the mathematics presented here. However for illustration purposes only the 
argument can be simplified in this way. 
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Thus the choice of f£; ~ 5 or 6 in EEG analysis is driven by necessity 
rather than correctness. Consider a moderately low resolution EEG that 
digitizes data to 10 bits. With 5 dimensions, phase space covers (219)? z 101° 
possibilities. A typical weakly stationary 30 second EEG segment often used 
for analysis, sampled at 512Hz, produces data in the order of 10? samples — 10 
orders of magnitude less than the size of phase space! One would hope that 
the amount of relevant phase space is sufficiently small to be describable by 
such small volumes of data, although these assumptions are only justifiable for 
special cases like during epileptic seizures in which the waveforms are simpler. 

Even so a significant volume of literature utilizes such values with little or 
no admission to its shortcomings. The implicit assumption is that embedding 
dimensions of such low magnitude are able to capture the correct dynamics 
rather than all the dynamics of z[n], and consequently zm]. This is not nec- 
essarily true, and the reader is encouraged to question the validity of results 
obtained through the use of inadequate parametrization. 

Another concern is that of computation time. For an EEG signal x,[n] 
where c= 1,2..Cror channels are available and therefore Cro delay recon- 
structions are possible, the effective dimension of the overall phase space is 
nx Cror, even though some redundancy exists. For computational efficiency, 
there is a trade-off between Cror and fi. In some works, such as that in [37], 
40 standard PCs were necessary to allow real time computation of most of the 
algorithms presented here for Crop = 128. This is not always (or usually!) 
feasible, so a subset of channels (Cror = 4 or 5) at appropriate sites are 
selected for analysis. 

The following sections assume that the choice of ù and 7 and Cror in 
the analysis of an arbitrary signal y[n] used to reconstruct a system Zz|[n] is 
adequate for the particular application. 


3.3.4.2 Dimension — How Complex is a System? 


The dimension of a system, described in Section 3.1, is a measure of the 
complexity in the dynamics of z[n]. Dimension in Euclidean space is familiar 
and intuitive to most — it is the minimum number of coordinates required 
to unambiguously describe the location of a point in phase space RY. For 
example the geographical location of a ball bouncing on a hill anywhere on the 
earth’s surface can be defined by 3 variables — latitude, longitude, and height 
above sea level. In this case N = 3. The dimension of a dynamical system can 
also be the number of variables necessary to describe its behavior [131], which 
in the case of our bouncing ball will be greater than 3 because it is necessary to 
know the velocity of the ball (in each of the 3 spatial dimensions) and aspects 
of the ball that define its movement and bounce (such as elasticity, volume, 
air pressure and temperature of the gas inside it, etc). Other definitions of 
dimension exist but in all cases an estimate of the dimension of a system 
provides a feel for its complexity. The dimension of a non-linear system, 
regardless of definition, shall from now on be referred to as N. 
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The dimension N of non-linear systems can be a non-integer. The idea is 
that the actual dimension of a system is always an integer, but the effective 
phase space where the orbits lie most of the time may be a small volume 
of the entire space. To capture this non-integer fractal dimensions have been 
introduced. Several different types of fractal dimensions have been defined, in- 
cluding information dimension, capacity dimension and correlation dimension 
[131] of which only the latter is discussed here. 

The correlation dimension computed from the re-constructed state variable 
Z[n] derived from a time domain signal y[n] of length n — 1,2..N, using 
embedding dimension f), is given by: 


^ R . .  OlnC(f,e) 
N=D = lim lim —————— 3 
duoc qui m sue und 
where C'(fi, €) is the correlation sum [10] 
: 2 " R 
C(R,e) = NNT XO O(e- lafn;] — 2[n5])). (3.38) 


ni«mn; 

O(y) is a sign function with a value of 1 if y > 0 and zero otherwise. 
'The correlation sum gives the empirical probability that any two instances 
of z[n] lie within a distance € of each other [106]. For the limits in Equation 
3.37 to exist ergodicity is required, that is, it is necessary that a long enough 
observation of the signal will allow us to infer the system's properties. 

Ideally the calculation of Equation 3.37 requires an infinite amount of data 
that must be stationary. To track changes over time as well as to deal with 
the problems presented by the non-stationarities in z[n], only finite (and not 
very long) lengths of data can be used. In such cases an effective correlation 
dimension DS? (fi, €) can be estimated for a limited range of e where Da (ñ, e) 
is found. That is, 


1 2 8n C(, e) (3.39) 


De! (h.e) UN. Olne 


€=€lower 


where Ne is the number of e values in the interval [éjower, €upper|. Details 
on appropriate selection of €jower and €xpper can be found in both [10] and 
[90]. 

Figure 3.19 shows the correlation sum calculated for awake, sleep and 
epileptic EEG, using embedding delay 7 — 30 samples (about 0.06 seconds 
at 512Hz sampling) and embedding dimension i = 4. At least 30 seconds 
of data is used for each computation, meaning that at worst there are in the 
order of 15,000 samples available for computation. The effective correlation 
dimension D$f /(&,c) is calculated as the gradient of C(A,c) over which it 
follows a rough straight line. Here both the awake and asleep states show 
roughly equal complexity whilst the epileptic seizure is lower dimensional, as 
expected, because its slope is shallower. 

Notice also the saturation effects, best observed for the ‘normal awake’ 
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FIGURE 3.19: Estimated correlation integral for normal, sleep and seizure 
EEG. The computed correlation integral is a relative measure that should not 
be interpreted as an absolute value. T'he gradient of seizure curve is smaller 
than for sleep and awake EEG, indicating a smaller p 7 (f, c), and thus a 
smaller dimension. The dimensions of sleep and awake EEG are roughly 
similar to one another. T = 0.06 seconds is used, equivalent to 7 = 30 samples 
when PF, = 512Hz, with n = 4. For each estimate as much data was used as 
is available, with at least 30 seconds of data for each sequence. 


state. For small values of e there is insufficient data and no points can be 
found that are closer to each other than this distance. The correlation integral 
approaches zero. For large e all data can be found to lie within this distance 
of each other, and the correlation integral saturates. 


3.3.4.3 Lyapunov Exponents — How Predictable is a System? 


The predictability (or deterministic structure) of a system z[n] can be mathe- 
matically described by its Lyapunov exponents. These are statistics that give 
an idea of how much the system changes when a small perturbation or change 
is introduced [131, 159, 183]. For example, return to the bouncing ball ex- 
ample but now assume that it has come to a stop on a smooth flat surface. 
Pushing the ball softly moves the ball, but it will not go far from its resting 
point and it is easy to tell where it will end up. Its behavior is predictable 
and the Lyapunov exponents of this system are small. However, if the ball is 
resting on top of a narrow ledge any small force in the wrong direction can 
result in the ball falling off the edge. It could end up anywhere depending 
on the topography of the descent. This behavior is less predictable and its 
Lyapunov exponents are larger. 

An N dimensional system has N Lyapunov exponents, denoted A; with 
i = 1,2.N. Each A; gives an idea of the amount of contraction (A; < 0) 
or expansion (A; > 0) of the system in the direction of the corresponding 
dimension [34, 82, 131, 183]. However since random perturbations are ex- 
pected to cause the most change in direction of the largest Lyapunov expo- 
nent, Amax = max(A;) is sufficient to reliably and reproducibly characterize 
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the predictability in the dynamics of a system [72]. It is the most un-predictable 
behavior that limits the predictability of a system. 

Again due to the limitations imposed by finite time signals and non- 
stationarities a short term estimate of Amax is used [72, 71]. One example 
algorithm that estimates the largest Lyapunov exponent from a reconstructed 
system Z[n] assumes that Amax can be computed by testing the exponential 
divergence of trajectories. Suppose we select an arbitrary time point ng. We 
look at trajectories of length An, and compare the divergence between a tra- 
jectory starting at z[no] and one starting at z[n]. Averaging over all such 
trajectories where z[n] is e close to z[no] gives an estimate of the Lyapunov 
exponent. This is accomplished through the following computation [81] 


T 
1 1 
Amax(ft, €, An) = T 1 In N |Z[ng + An] — Z[n + An]| 
no=1 "0 IzIn] -Z[no]| € e) 


(3.40) 

In the above, the difference between where Z[no] ends up at time ng + An 
and where Z[n] ends up at the same time is measured. The average is taken 
over Nn, points for values of n in which z[n] is within e distance of z[no]. The 
process is iterated over a large number of trials T, where a different starting 
point no is selected for each. 

Amax(f, €) is computable if there is a range of An over which Ajax (f? e, An) 
remains constant. In theory this constant should remain the same for all Am, 
but this does not occur in practice due to data record limitations. If all nearby 
trajectories end up in roughly similar positions, then Agax(f, €) is small and 
the system is predictable. Otherwise Amax(Ĥ, €) is large and the system is less 
predictable. 

The two averages in Equation 3.40 normalize results by the number of tra- 
jectories Nn, and the number of trials T. Averages result in better estimates 
in noisy data. Amplitude normalization for different systems must be applied 
to y[n] prior to computation. 

Lyapunov exponents computed for awake, sleep and epileptic EEG are 
shown in Figure 3.20, using 7 — 30 samples (about 0.06 seconds), embedding 
dimension ù = 4 and e ~ 1% of the signal amplitude. The estimations are 
roughly equal over the different An, although as is common in this type of 
analysis this convergence is very rough and averages must be taken over An. 
Relative to each other the seizure EEG is more predictable than normal or 
asleep EEG because its maximum Lyapunov exponent is closer to zero. Sleep 
and awake EEG seem to display similar predictability. 


3.3.4.4 Entropy — How Random is the System? 


Entropy is a measure first used in thermodynamics to give an idea (at the 
macro-scopic level) of how disorderly a system is. It is one of the most abused 
technical terms in the world, largely because so many interpretations exist. 
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FIGURE 3.20: Estimated maximum Lyapunov exponent for normal, sleep 
and seizure EEG. The computed statistic is a relative measure that should 
not be interpreted as an absolute value. Notice the high level of variability 
observed in each case - this is typical in the estimation of Lyapunov exponents. 
However on average a flat line can be drawn for each. This line is much closer 
to zero for the seizure EEG, indicating that it is more predictable than sleep 
or awake EEG. 7 = 0.06 seconds is used, equivalent to 7 = 30 samples when 
F, = 512Hz, with f = 4 and e of 1% of the signal amplitude, normalised within 
each window. For each estimate as much data were used as is available, with 
at least 30 seconds of data for each sequence. 


Entropy is a measure of randomness because it quantifies the amount of dis- 
order in a system. It is also a measure of information because the level 
of randomness indicates how much information can be transferred in a single 
measurement taken from the system. It is a measure of compressibility because 
the less information carried in a signal, the more efficient that information can 
be communicated. 

In general all entropies give an idea of, at least qualitatively, the same 
concept — the amount of order or disorder in a system. The more random 
something is, the more information we need to capture its essence. Let us 
return to the bouncing ball example, again at a stop. Unless something un- 
foreseen happens the entire future of the ball is known: it will stay in the same 
place. It is a completely deterministic system and only one measurement (its 
position) is necessary to know everything about the future. Its entropy is 
low (zero). Now imagine that the ball is bouncing, but the direction that it 
bounces in each time is random and it is impossible to tell where it will go 
next. Measurements must be taken at all times because nothing can be said 
about the future of the ball. This random system has high entropy. 

Let us first define a general entropy of a discrete probability distribution 
p(k) € [0,1], where k counts the number of different possibilities, and p(k) 
denotes the probability of event k occurring: 
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FIGURE 3.21: Coarse-graining in the computation of entropy. In (a) K — 10 
bins are used to divide y[n]. This is fine-graining relative to (b), where only 
K = 6 bins are defined. The distance (in amplitude) between bins is e. On 
the left of each graph is the PDF of the signal, that is, the probability p(k) 
for each bin is shown. Notice that the PDF depends on the length of the 
window used. If only 0.5 seconds are used to compute it (as opposed to the 
full 5 seconds) then the PDF between 0-0.5 seconds will look different from 
that between 1-1.5 seconds. Long enough time intervals must be used so that 
the computed PDF remains stationary. This is especially important when fine 
graining is used. 


-Ek p(k) logy p(k) ifq=1 
Bathe) . (3.41) 
ilg) Xu) ifq>1 


A logarithm base 2 is used so that the entropy is defined in bits. 

This entropy can be used in dynamical systems as follows. An arbitrary 
signal y[n] is quantized into K symbols (that is, K possible measurements or 
outcome events), where the number of bins k = 1..K is known as the alphabet 
and represents all the possible symbols in the quantized signal. For example 
if y[n] is always between 0 and 1, K — 5 if the [0 — 1] range is divided into 
5 regions. This is also known as coarse-graining of y[n] into e-sized bins, and 
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in this case e = 0.2. The probability p(k) of each symbol k occurring may be 
defined as 


ys Time that y[n] is in bin k 
PU = “Total length of y[n] (N) ` 


An example of coarse-graining for K = 6 and K = 10 is shown in Figure 
3.21. The probability distribution p(k) is also shown. For very fine graining 
it is important that sufficiently long signals be used to compute these proba- 
bilities. For example, if only 0.5 seconds are used to compute p(k) then the 
PDF would look different at different times on the 5 seconds of data shown. 

Coarse graining has the advantage of making the computation of entropy 
less susceptible to noise than estimates of Lyapunov exponents or correlation 
dimension. This comes at a cost — the difficulties that are avoided are instead 
shifted to the selection of an appropriate e and alphabet scheme. Computation 
of H,(f,c) in theory requires the supremum to be taken over all possible 
quantization schemes because quantization implies that entropy is not a co- 
ordinate independent system. The limit of c > 0 is necessary, but not feasible 
for finite y[n]. 

Several interpretations of Equation 3.41 exist. When embedding dimension 
f = 1 (ie. no embedding) and q > 1 it is known as the block entropy. 
Each integer q defines a different family of entropy. The special case H; (Ĥ = 
1,€) when q = 1 is known as the Shannon Entropy and it is the definition 
classically used in information theory to determine capacity of communication 
channels. For example, suppose a signal y[n] € (0,1) is divided into K = 8 
bins. Assuming all symbols in the alphabet are equally likely, that is, that 


Equation 3.42 yields p(k) = E for k = 1,2---8, then 


(3.42) 


1 1 1 
Ay(n=1 = l 4 
1(f 1g) 2 8 082 8 (3.43) 
1 
= 8 3-8 (3.44 
= 3 bits (3.45) 


This result tells that that in general we need 3 bits to represent the quan- 
tized y[n] accurately. 

The Komolgorov-Sinai entropy is an interpretation of Equation 3.41 that 
explores transition probabilities between measurements, computing the block 
entropy for a sequence of symbols. It is analogous to using embedding as 
in Equation 3.35 and corresponds to values of ù > 1 because Z[n] is an fi 
dimensional object looking at a block of f; past observations. The longer the 
sequences used to compute Komolgorov-Sinai, the higher the ù that is used 
for reconstructing z|[n]. 

All the usual problems resulting from insufficient data for high dimensions 
and the selection of appropriate time delays apply. For a reconstructed system 


146 Epileptic Seizures and the EEG 


Normal awake 
Normal asleep 


[rm Epileptic 
0 t IN 


ho (A, €) 


Scale € 


FIGURE 3.22: Komolgorov-Sinai entropy for normal, sleep and seizure EEG. 
'The computed entropy is a relative measure that should not be interpreted 
as an absolute value. It is lower for seizure than for sleep or awake EEG over 
the range for which the correlation integral is valid, implying that seizures 
are more ordered than the other two. Sleep and awake EEG have a similar 
level of order. 7 — 0.06 seconds is used, equivalent to 7 — 30 samples when 
F; = 512Hz, with ù = 4. For each estimate as much data was used as is 
available, with at least 30 seconds of data for each sequence. 


with ñ dimensions and K symbols in the alphabet of each dimension, the total 
number of bins is K^. As the dimension increases the required number of data 
points to estimate H(e) to any degree of accuracy grows exponentially. 

For q = 2 the Komolgorov-Sinai entropy ha(fi, €) is related to the correla- 
tion sum C(fi, c) defined in Equation 3.38 by 


ha(f,€e) = Ho(n+ 1,€) — Ha(fi, €) = In M à (3.46) 


ha(fie) is only valid for values of e for which the gradient of C(n,€) is 
constant. Because A2 (fi, €) is only valid for limiting behavior of fi then this type 
of entropy requires even more data than Lyapunov exponents or correlation 
dimensions to compute [81]. 

Shannon entropy Hi(fic) can also be computed in ù > 1 dimensions. 
The advantage with ñ = 1 is that computation is fast and easy. In higher 
dimensions the data requirements again become prohibitive but computation 
remains simpler than Equation 3.46. Shannon entropy is usually expressed 
with a logarithm base 2 so that results can be interpreted in bits per second, 
a property that is relevant to information and compression theory for which 
it is most often used. 

'The Shannon entropy is computed using a simple box-counting method, in 
which phase space is partitioned into e sized bins or boxes and the probabilities 
are calculated by counting the number of data points in each bin. This is the 
same as the computation of p(k) in Figure 3.21 but in multiple dimensions. 
'To avoid the dependence of quantization on scale and magnitude a signal is 
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first normalized to the unit variance as in Equation 3.13. In practice the bins 
are often allowed to overlap so that edge effects can be reduced. 

Figure 3.22 computes h(n, €) for the typical states, using 7 = 30 samples 
(about 0.06 seconds with 512Hz sampling) and ù = 4. Entropy of epileptic 
seizures (for the values of e that are valid) is lower than either asleep or awake 
EEG, supporting the notion that there is less randomness in the system. A 
more detailed exploration of Shannon entropy can be found in Chapter 5. 


'The non-linear statistics discussed in this section provide in- 
formation about the non-linear system z[n] that generates the 
signal y[n]. The co-ordinate invariant features of z[n] can be ap- 
proximated by using the Takens/Aeyels theorem to reconstruct 
the system Z[n] from y[n], as described by Equation 3.35. The 
co-ordinate invariant non-linear statistics can quantify the 


e Complexity: The dimension N is a measure of how many degrees of 
freedom there are in a system. The correlation dimension is estimated 
as described by Equation 3.39. 


e Predictability: The maximum Lyapunov exponents Amaz describe how 
predictable the system is by measuring the divergence of trajectories. It 
is estimated as described by Equation 3.40. 


e Randomness: The entropy H quantifies the randomness of a system. 
Several entropy definitions exist. A generic definition can be found in 
Equation 3.41. 


All methods require long data sequences and stationarity (as well as a 
much stronger form of ergodicity) in y[n] to be estimated correctly. Their 
importance and their limitations in detecting and predicting epileptic seizures 
are involved, and are discussed separately next. 


3.3.4.5 Non-Linear Dynamics and Analysis of the Epileptic EEG 


The intrinsic non-linear properties of a system — dimension N , Lyapunov 
exponents A; and entropy H - all give ideas of very similar properties. For 
example the choice of e in each case is the partitioning of phase space with 
dimension fi into boxes of size c. 


Using the box counting analogy, the correlation dimension mea- 
sures how many points are found in nearby boxes; Lyapunov 
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exponents measure how far away from each other points start- 
ing in the same box end up; and entropy measures how likely 
it is that a point may be found in any one box. All give ideas 
of predictability or randomness of a system. Correlation dimen- 
sion, Lyapunov exponents and entropy interpret complexity by 
quantifying different ways in which a dynamical system evolves. 


Predictability and randomness are in principle very well suited to charac- 
terize epileptic seizures because its EEG is best described as a more ordered 
signal than that of normal activity. During a seizure a single channel becomes 
more oscillatory and therefore simpler. Many channels viewed together be- 
come less complex because they tend to resemble each other more. Hence 
both temporally and spatially there is reduced complexity, less randomness 
and more predictability. It is not surprising then that non-linear analysis has 
been widely used to analyze the epileptic EEG?. 

Dimension N, Lyapunov exponents à; and entropy H must be estimated 
using an intermediate system z[n] reconstructed from the measured signal 
y[n]. In EEG analysis y[n] = xe|n] is measured over a finite, non-stationary 
and noisy time-window. At the risk of repeating an important fact too often, 
weak stationarity may only be assumed in 20-30 seconds of recorded EEG, 
and this is typically insufficient to reliably compute any of these statistics!?. 
Procedures by which an approximation rather than exact values of these statis- 
tics can be calculated in this environment were outlined in previous sections 
— effective correlation dimension p j (fi, c), maximum Lyapunov exponent 
Amax (fi, €) and entropy H,(fi, €). 

The computation of both D»(f,c) and Amax(f,€) is highly sensitive to 
noise, showing no regions of convergence when even 2-396 errors are present 
in the data [81]. Analysis on noisy data can sometimes be improved by deter- 
mining the validity of the estimate through the rejection of a null hypothesis. 
'These tests can in general only be applied off-line. For this reason D 7 (à, e) 
and Amax(f,€) are often estimated only from intra-cranial EEG that is less 
noisy than scalp EEG. Entropy and its variants are less susceptible to noise 
and have been applied more extensively. 

'The computational cost of non-linear analysis is high. Inefficiency in algo- 
rithms is an expense afforded only to problems in which no other solution is 


9Unfortunately as is often the case when new methods become popular, much of the 
literature seems more a case of applying them to whatever data are there rather than to 
show an understanding of a complex problem. Readers should be warned that a certain 
degree of skepticism is needed when digging through the vast literature printed on the topic 
- often research is tested on very little data, and neither justification nor insight is provided. 
1030 seconds of data at 512Hz sampling gives 15, 360 samples. With fi = 4 and K = 10 
there are 104 = 10,000 bins. With ù = 6 there are 109 = 1,000, 000 bins. Only if embedding 
dimension is low does using 30 second windows make any sense. However a low fi will be 
an inadequate representation of the underlying system (unless there is a real collapse of 
dimension, as may be the case during epilepsy). 
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obvious. For seizure detection that requires fine temporal detail on-line algo- 
rithms based on non-linear measures are impractical. Non-linear analysis for 
detection of seizures has been explorative rather than conclusive. For exam- 
ple, efforts are made in [33] to determine how suitable H is for discriminating 
two states, and [129] is a comparison of the relative merits of linear versus 
non-linear methods for EEG analysis. Neither is specific to epilepsy. 


3.4 Detection and Prediction of Seizures in Literature 


We have now seen a good sample of the different methods available to analyze 
EEG data, and we have shown that at least in some cases these are able to 
characterize the difference between seizure and non-seizure activity. This sec- 
tion then looks, briefly, at how the different feature extraction methods have 
been used by seizure detectors and predictors as presented in the literature. 
It is assumed here that y[n] = xc [n]. 

Let us first look at epileptic seizure detection in the time domain. The 
statistics that deal with magnitudes and variance (uy [k], o, [k], cj [k] and vy [k] 
presented in Section 3.3.1.1) are often only used as a second stage statistic 
because by themselves they are not sufficient to reliably characterize seizure 
activity. First a feature is extracted from x,[n] and its mean, variance, COV 
and total variation are computed over time. Some examples of detectors that 
use these methods include [55], [50], [165], [170]. There are many many more. 
In [186] and [187] it was proposed that variance and total variation can be 
better discriminators of seizure activity than mean because these can detect 
how regularity changes over time. This is discussed in more detail in Chapter 
5. 

Changes in periodicity and synchronization, Section 3.3.1.2 and Section 
3.3.1.3 respectively, have been described in Chapter 1 as a key observable 
during many types of seizures. As such they are suitable features to extract 
for the detection of epilepsy. However although figures such as Figure 3.8(c) 
and Figure 3.9(f) show promise, perfect periodic activity in a single channel or 
perfect synchronization between channels is rarely observed, making the com- 
putation of CORR,[r, k] and XCORR,, ,,.[7,k] unreliable. Again as stand 
alone statistics they have failed as seizure detectors. Furthermore these mea- 
sures of synchrony are often not useful in intra-cranial analysis because EEG 
electrodes are close together and display high levels of synchrony even outside 
a seizure [75]. Nevertheless measures of periodicity have been used in detec- 
tion strategies in texts such as [97] and [117], and measures of synchronization 
have been incorporated in [74] and [75]. 

Spectral characteristics of an EEG are important for state identification 
and it is not surprising that FFT based methods and its variants, described 
in Section 3.3.2, form part of many epileptic seizure detection systems. Recall 
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that for the most part exactly the same information is contained in the time 
domain as in the frequency domain, but representations in the frequency do- 
main allow us to more easily identify features key to the detection of epileptic 
seizures. 

For example a series of papers ([8], [45], [135] and [178]) explore the fre- 
quency content of seizures estimated using methods described here as well as 
auto-regressive (AR) based models. They reason that exploiting a combina- 
tion of frequency and amplitude features to detect seizures is not new and 
virtually all methods capitalize on this notion. It is therefore important to 
know the features that have the most discriminating power. 

Several attempts in which spectral signatures of epileptic seizures are ex- 
tracted in order to detect seizures exist [163]. These involve the identification 
of stereotyped patterns in the evolution of frequency content that consistently 
present themselves before or during a seizure. However these signatures are 
patient dependent, cannot always be identified and their applicability on a 
larger scale is questionable. 

Time-frequency methods such as wavelet filters and its variants are de- 
signed to cope with non-stationarities and they are today the method of 
choice to extract "frequency" information from the EEG. Classification using 
wavelets has been applied to recognize seizures as well as other pathologies 
including schizophrenia and obsessive compulsive behavior (OCD) [65]. DB4 
is the most common choice of wavelet because it provides a smooth enough 
frequency filtering to appropriately characterize the EEG, whilst remaining 
computationally efficient [176]. 

Wavelet based feature extraction specific to seizure detection goes as far 
back as 1994 (in [162]) where their potential for the classification of epilepsy 
was first explored. Much of the early focus lay on identifying the frequencies 
that are important. It was found that in general higher frequencies (7 50Hz) 
corresponding to lower scales a can be ignored because of measurement noise, 
intermediate values of a can be used to detect the presence of artifact, whilst 
the ratios of higher scales can be used to distinguish between the normal and 
the seizure states. 

Other time-frequency methods applied for the detection of epileptic seizures 
exist. In particular, the Gabor atom density derived from the original match- 
ing pursuit algorithm described in [41], [77], and [193] has claimed success in 
the detection of unspecific epileptic seizures. Gabor atoms are able to resolve 
different time and frequency scales in much the same way as wavelets, where 
a mother function is now translated and modulated at different frequencies. 
Gabor atoms are particularly tuned to pick out rhythmic activity, thus their 
ability to isolate the resonant frequencies of epileptic seizures. 

Non-linear tools are thought to be too data intensive and computationally 
taxing to be used in the problem of seizure detection. However some work 
exists, for example in [198] where non-linear tools that require less data are de- 
veloped for detection of seizures by first projecting signals into a linear space. 
This destroys the non-linear characteristics of EEG, but emphasizes regular- 
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ity. However the computational disadvantages of non-linear analysis remain, 
whilst providing no more information than linear measures could extract. 

Seizure prediction has been attempted with much less success than de- 
tection. Prediction using accumulated energy time-domain measures (more 
sophisticated versions of the theory presented in Section 3.3.1.1) failed to 
identify the pre-seizure state because sleep and post-seizure EEG, which are 
high in energy, interfere with results. Energy based measures of this nature 
cannot be used to predict seizures in a stand-alone manner, but may be used 
as complementary features to other algorithms (see [39] for details). 

Some believe that synchrony is key in prediction. Mormann et al [112, 113] 
and Chavez et al [28] rely on capturing the synchronicity between different 
channels of the EEG to predict the onset of seizures. Their work is based on 
calculations that use both Equations 3.19 and 3.24. The authors have demon- 
strated marked decreases in synchronicity long before an epileptic seizure, re- 
gardless of the type of test [112, 113]. This de-synchronization is explained by 
the slow entrainment of neurons into a seizure, where at the start of a seizure 
only the focus is involved and thereby its activity varies from surrounding 
sites. Under very limiting conditions it was found that correct prediction oc- 
curred for 8096 of cases, and more if restricted to temporal lobe epilepsies, 
with de-synchronization occurring from 4 to 221 minutes before a seizure. In 
accordance with previous observations maximum synchronization is observed 
at the onset of a seizure. The examination of the spatial patterns of entrain- 
ment to aid optimisation of channel selection as well as to characterise the 
route to an epileptic seizure was introduced in [28] by applying the theory to 
narrowband signals. 

Nevertheless little independent evidence exists that this algorithm is appli- 
cable to more than specific cases. Furthermore, it is implied in the calculation 
that the entrainment into seizure is slow — this is not true for many patients. 
However as a start to the difficult task of seizure prediction the ideas presented 
in this work merit reflection. 

Spectral features, like in detection, form at least part of many published 
predictors of seizures. For example spectral signatures are used in [117]. The 
idea is that for many people the route to epilepsy is consistent and pattern 
recognition methods can be used to detect the precursors. However as in 
detection these signatures are patient dependent, cannot always be identified 
and their applicability on a larger scale is questionable. For those patients for 
whom this works it should be exploited as much as possible, but recognizing 
patterns does not infer anything about underlying dynamics, leading to missed 
events in cases where the evolution into a seizure is not typical. 

'The importance for non-linear tools described in Section 3.3.4 is found 
in prediction rather than detection of seizures. These methods assume the 
gradual entrainment of neurons into a seizure, slowly resulting in an EEG 
that is more ordered and less random. DE! (à, €), Amax(fi, €) and H4(fi, €) are 
expected to slowly decrease prior to a seizure, reaching their minimum im- 
mediately prior to the event. This assumption has been extensively explored. 
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D$ (&, €) is used as a predictive measure in some of the literature, such as 
[90], where it is shown that the expected transitions to low dimensionality 
occur prior to a seizure. This was achieved by treating D$/ (f, c) as a non- 
absolute and rather informal definition of dimension. Predictive times ranged 
between 4 and 25 minutes. Higher temporal resolution can be achieved by 
the use of a more computationally efficient and adaptive algorithm proposed 
in [119]. 

Maximum Lyapunov exponents are explored in [71]. They notice a marked 
decrease in Ajax(fi, €) beginning over 2 hours prior to seizure onset, and a 
sharp post-seizure increase. Their algorithm adaptively selects the best chan- 
nels to use for computation. Physiological significance has been implied in [72], 
where it was shown that pre-seizure entrainment of critical sites lasts signif- 
icantly longer than post-seizure disentrainment. This suggests that seizures 
may be mechanisms by which the brain is able to reset itself after a grad- 
ual transition toward pathological behavior. If this is so, pharmacological (or 
other forms of) intervention with the goal of resetting the brain once entrain- 
ment begins may alleviate the need for a seizure to occur. 

'These types of statistics showed some promise at first but recent tests on 
simulated and real EEG data sets indicate that both Lyapunov exponents and 
correlation dimension are unsuitable to predict epileptic seizures |62, 88]. 

In some literature ([3, 152, 151]) the computation of entropy, Lyapunov 
exponents and correlation dimension is estimated from different wavelet fre- 
quency bands rather than different embedding dimensions, on the assumption 
that the dynamics could be disjoint at different scales. Insufficient testing 
makes results inconclusive and patterns common to many patients are not 
found. 

In all cases of published predictors the validity of the algorithms is largely 
questionable. Data sets are often small, results are reproducible only for cer- 
tain patients and the effects of state of vigilance and artifact are not explored. 


Virtually all published detectors and many predictors of epilep- 
tic seizures use differences in frequency content as part of their 
feature set. Time-domain linear methods have typically been 
used as a supplement to other features. To capitalize on the ad- 
vantage of both, time-frequency analysis such as wavelet analysis 
is becoming more popular. 


In the case of prediction the focus is to use non-linear rather 
than linear tools. However after much work little success has 
ensued and it appears that prediction is a problem that will need 
patient specific methods, and most likely will have to involve 
other measurements in addition to the EEG. 
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Prediction is revisited in Chapter 7. 


3.5 Conclusions 

Designing a classifier for a complex system such as the epileptic EEG involves 
several stages of signal processing. This chapter focuses on the preprocessing 
and feature extraction aspects of the detection process shown in Figure 4.1. 


The selection of a feature set that can discriminate between 
different states is imperative to the performance of a classifica- 
tion system. For the epileptic EEG the features must differenti- 
ate between non-seizure, pre-seizure and seizure. The extraction 
method must target those aspects that make seizures different 
from normal brain activity. Each feature must add at least some 
new information because features that target exactly the same 
information do not add any discriminating power, and make the 
classifier more expensive computationally. 


'The signal processing tools employed to extract these fea- 
tures must cope with noise, non-stationarity and possible non- 
linearities in the EEG. 


It is necessary that the signal be normalized so that compar- 
isons of the extracted features can be made between different 
times, different electrodes and different people. Normalizing a 
signal entails removing its mean and making its features scale- 
invariant. 


In this chapter we surveyed signal processing tools that can extract features 
from any signal, but that are particularly well suited to the study of the EEG. 
The collection of methods presented and the references to their literature is 
not exhaustive, but aims to be representative of what is available and has 
been tried over the last few decades. 

However apt the features, a detection or prediction system cannot work 
without a suitable classifier or the combination of the information in an expert 
way. These are discussed next. 


4 
Classifying the EEG 


In Figure 4.1 a complete detection system is divided into four tasks: prepro- 
cessing, feature extraction, decision making and expert system. The first two 
have been discussed in detail in Chapter 3. Here we focus on the rest, again 
with specific interest on detection and prediction of epileptic seizures. We 
assume that we have extracted a set of S features (G1, ¢2,--: , Cs) and we 
assume that these are suitable to distinguish between non-seizure, seizure and 
in the case of prediction pre-seizure EEG. We must now make good use of 
this information to decide in an expert manner to which of these classes the 
features belong. 

A classifier makes decisions based on information taken from single chan- 
nels, multiple channels or combinations of these. The classifier can be as 
simple as imposing a threshold on features, or can use more sophisticated ma- 
chine learning algorithms. It must first be trained on how to make decisions, 
and only then can it be applied to previously unseen data. Several types of 
classifiers are discussed in Section 4.1. 

The expert system is the overall strategy that is developed. Knowledge 
learned from preprocessing, feature extraction and classification is combined 
to make a final decision. It takes into account contextual differences such as 
those found between the EEG of different patients or different times of the 
day. The expert system is an entity that cannot be completely separated 
from either the classifier or the feature extractor because it encapsulates the 
entire process: a global strategy that determines what features to select, how 
to combine them and to what problem they are most applicable, so that 


Data 
Acquisition & 
Preprocessing 


Feature Detection 
Extraction Strategy 


Expert CLASSIFIER 
System OUTCOME 


FIGURE 4.1: General structure of seizure detection algorithms. Data prepro- 
cessing and feature extraction describe ways that the original EEG signals are 
manipulated to extract information that can differentiate between the non- 
seizure, pre-seizure and seizure states. These two are discussed in Chapter 
3. Classification and expert system, discussed in Chapter 4, are the way that 
these features are then used to make a decision as to which class they belong 
to. 
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performance is optimized. For example, extracting features from a signal is 
part of the feature extractor, but the selection of the correct features is part 
of the expert system because it requires knowledge of the problem. More 
examples are presented in Section 4.2. 


4.1 Types of Classifiers 


Classifiers are automated mathematical constructs that separate between pre- 
determined classes, under the assumption that the feature set presented to it 
belongs to one of these classes. Formally, 


A classifier maps an input space, in this case the feature set, to 
a decision space O each time a new feature set is presented 


O — F(G, C2, ss). (4.1) 


The map F(-) may be a linear or non-linear combination of 
the features (G1, C2,- , Cs). 

Classifiers can be deterministic or probabilistic. The first 
draws strict boundaries between different classes and return a 
‘belong to’ answer. Probabilistic classifiers determine the likeli- 
hood that a feature set belongs to a particular class. It returns 
a vector of the probability of belonging to each class. 


Most classifiers discussed here are deterministic, although stochastic coun- 
terparts exist and are discussed where appropriate. 

The quality of the decisions is only as good as the design of the classifi- 
cation system. For example, let us design a classifier to determine the owner 
of the bouncing ball. Let the extracted feature of the ball be its color, and 
we train the classifier that red balls belong to Mary and blue balls belong 
to Terry. What does the classifier do when it comes across a green ball? It 
was not trained to deal with this and its decision will be unpredictable. It 
is important to select an appropriate training set — the data used to train the 
classifier to make decisions. The training set must be representative of all 
types of data likely to be presented to the classifier. In most cases all the pos- 
sibilities are not known, and it is wise for the classifier to contain an ‘I do not 
know’ state. 

Sometimes, as is the case with detection of epilepsy, it is not possible 
to know all the types of data that will be presented. Consider the develop- 
ment of a classifier used to detect seizures that are very infrequent and of 
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rare morphology. EEG databases are unlikely to have examples of this rare 
phenomenon, and adding these seizures to the database may not be possible 
because typical EEG monitoring lasts days rather than months. The classifier 
cannot be trained to react to these events. 

Once a classifier is trained its performance must be evaluated using a 
testing set. This is part of the validation process that ensures the classifier is 
behaving as desired. The training and the testing data should where possible 
be mutually exclusive. The development process should avoid using data in 
the test set, and the validation process should not contain any of the training 
data. 

'Three approaches to classification are presented in this section — associ- 
ation rules, artificial neural networks (ANN) and support vector machines 
(SVM). All methods attempt to find an optimal map F(:) between feature 
space and output space, the differences being the way in which decision bound- 
aries are obtained. 


4.1.1 Association Rules 


Association rules use relationships between the features in the feature set and 
the correct output to determine an appropriate F(-). The process is usually 
manual and no formal training procedure exists. The feature set is inspected, 
simple relationships using features are selected and thresholds are used to 
make the decision. The relationships do not need to be simple but in practice 
often are because automated methods are better at finding the more complex 
ones. 

A threshold is a number that separates between classes in a single dimen- 
sion, e.g., when applied to a single feature ¢,. A hyperplane is an analogous 
decision in multiple dimensions, e.g., when a combination of several features 
are thresholded together. The first is simpler to compute but allows for re- 
dundancy because features are treated independently. The second process is 
more powerful, but again automated methods discussed later are more suit- 
able to find these. In all cases it is not always easy or even possible to draw a 
clear distinction between classes, as seen for example in Figure 4.3(b) and (c) 
where a simple line or hyperplane cannot unambiguously identify each class. 

In epileptic seizure detection all early algorithms developed the association 
rules manually because no better way existed. The focus was on finding 
features that showed marked differences between classes rather than on finding 
complex maps F'(-) [175, 177]. 

Recently more involved methods have been used to try reduce the number 
of false classifications in seizure detection [188]. Decision trees trained to 
represent F(-) are used in [135], and stochastic methods are presented in [156] 
where Bayesian analysis is used to determine the conditional probability that 
the partial EEG record belongs to a seizure. Relaxing of strict boundaries 
allows some flexibility in F(-), particularly when seizure probabilities can be 
adjusted given the presence of commonly observed artifact. 
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Most seizure prediction algorithms today use thresholding because the 
computational expense focuses on feature extraction rather than the training 
of more sophisticated classifiers. 


4.1.2 Artificial Neural Networks 


Artificial neural networks (ANN) are a mathematical, albeit rough, analogy 
to biological networks of neurons found in the brain. Each functional unit in 
an ANN resembles the behavior of a biological neuron, and when connected 
together artificial neurons or nodes are well suited for tasks such as pattern 
recognition and classification. In this chapter ANNs are described in the 
context of classification of the EEG where a feature set can belong only to 
a limited number of predefined classes. An ANN is trained to estimate the 
map F(-) in Equation 4.1. More information about a broader applicability of 
ANNS can be found in [64] and [100]. 

The idea for ANN began as early as 1943 when a mathematician and a 
neuroanatomist (Pitts and McCulloch) decided to collaborate and concluded 
that a sufficiently large network of simple “all-or-none” functional units could 
be used to approximate any realistic function [64]. Since then the field has 
grown tremendously, ranging in applications such as system identification, 
signal detection and classification, as well as modeling of real biological neural 
networks [100]. 

An artificial network contains k = 1,2..K interconnected nodes!. Each 
node is composed of three basic elements shown in Figure 4.2(a): 


e Inputs: aset of M inputs to the node, 551, Sk2--SkM, each with its own 
weight, Wki, We2--Wem, that characterizes the influence that each input 
has on that particular node. This structure is analogous to the dendrites 
of biological neurons, although any “synapse” in an ANN has adjustable 
weights that can take a positive or negative value. The strength and 
polarity of a real synapse is more or less fixed. 


e Integrator: a summation mechanism that combines the inputs as 


M 
Uk — 5 WkmSkm + bk. (4.2) 


m-i 


'This is analogous to the membrane potential observed at the soma of a 
real neuron. b, is used to change the DC offset of v. 


e Activation Function: a function v(-) that transforms v; to the output 
of the node so that yy = wv(v;). In a biological neuron this is the function 
that determines the firing of an action potential. The activation function 
is used to limit the output range at each node. 


l'The term node is used from now on, as opposed to neuron, so as to differentiate these 
artificial units from their biological counterparts. 
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FIGURE 4.2: Artificial neural networks. (a) shows the components of a single 
artificial neuron, (b) shows typical activation functions, (c) shows an example 
of a single layer feedforward network with M — 4 inputs, K — 4 nodes, and 
with 4 outputs and (d) shows an example of a multi-layer feedforward network, 
with one input and one hidden layer. The dotted line shows how feedback may 
be added to this feedforward network. Networks in (c) and (d) are both fully 
interconnected. 
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It is with (-) that non-linearity can be introduced into ANN. Figure 
4.2(b) shows examples of typical activation functions. The simplest form is 
a threshold — when v, is above a certain value, yy is 1, otherwise it is zero. 
A larger range of outputs can be introduced by using a piecewise linear or a 
sigmoidal w(-); both are in Figure 4.2. The sigmoidal function is by far the 
most commonly used because it is differentiable [64]. An example sigmoidal 
function is 


1 

pluk) (1 + exp(—avy)' (4.3) 

where a is the parameter that determines its slope. Note the similarities 

between artificial neural networks and the modeling of biological neurons de- 

scribed in Chapter 6. A threshold activation function is similar to that used 

in the integrate and fire biological neural models, whereas the sigmoidal ac- 

tivation function is similar to that used in continuum models (see Equation 
6.8). 

Networks are created when multiple nodes are interconnected. Typically 
the nodes are arranged in layers, although in theory any configuration is 
possible. The layer to which the inputs of the classifier project is known 
as the input layer. Here the input layer is that to which the feature set 
(G1, ¢2,--+ , Cs) is connected. The collection of nodes that connect directly to 
the output of the classifier is known as the output layer, corresponding to O 
in Equation 4.1. Any layers between the input and output layers are referred 
to as hidden layers. 

Networks in this configuration can be loosely classified dependent on the 
number of layers and the types of interconnections found between them. They 
may be [64, 100]: 


1. Single Layer Feedforward: The network consists of only one layer of 
nodes and the input layer projects directly onto the output layer. An 
example with M — 4 inputs can be seen in Figure 4.2(c). The outputs 
are not connected back to the input layers — it is a feedforward network. 
This type of network can approximate linear maps F(-). 


2. Multi-layer Feedforward: 'The network consists of at least one hidden 
layer. The outputs of each layer project onto the next layer, and so 
forth, until the output layer. The output of each layer never projects to 
earlier layers — again it is feedforward. An example of an ANN with one 
hidden layer can be seen in Figure 4.2(d). Through the addition of at 
least one hidden layer non-linear maps F(-) can be estimated [178]. 


3. Recurrent/Feedback: A network in the configuration of (1) and (2) 
above, but where at least one output of a later layer projects back to the 
input of an earlier layer is called a recurrent network because feedback is 
added to the system. Adding recurrency allows the output of a node to 
be directly or indirectly affected by its previous outputs. Figure 4.2(d) 


Classifying the EEG 161 


shows how the multi-layer feedforward network can be transformed to 
include feedback connections. 


The number of connections in a network is also important. A network 
where every node in a layer is connected to every node in the next layer is 
fully interconnected. 'lhese are more flexible but require a greater number 
of weights and thus more processing power than partially interconnected net- 
works. All networks shown in Figure 4.2 are fully interconnected. 

Once a network configuration is chosen (dependent on the task) the ANN 
must be trained to estimate the map F(-). During training the weights in 
the ANN are selectively and recursively adapted until the desired behavior is 
observed. This is similar to how the brain learns tasks by adapting the way 
in which its neurons interact. The manner in which the weights are adjusted 
is called the learning algorithm. 

The aim of the algorithm is then to, given an input and a desired output, 
adapt the weights in the ANN so that its output matches as closely as possible 
the desired output. This is done through the introduction of a cost function 
that penalizes divergence from the desired output. It may be as simple as the 
Euclidean distance between the ANN output and the desired output, in which 
case the cost function is a measure of the error. It is the aim of the learning 
algorithm to select the set of weights that minimize this cost function. 

A plethora of such algorithms exist, each designed to work best with a 
particular network configuration. The back-propagation learning algorithm 
designed for multilayer feedforward ANN, most commonly used with one hid- 
den layer, is an important one for classification problems. It is a form of 
steepest descent optimization where the gradient direction is calculated iter- 
atively and in a computationally efficient manner. The weights of the nodes 
are adjusted so that the overall error between desired and actual output are 
minimized. Like all gradient based methods it suffers the problem that only 
local minima are found, and the final weights are normally a function of the 
initial randomly selected ones. 

Another typical network configuration is a self organizing map (SOM) that 
utilizes competitive learning during the training process. Self organizing maps 
are single layer feedforward networks that are fully interconnected. The idea 
behind competitive learning is to emulate some behavior of a biological brain. 
Subgroups of neurons in the brain learn to react to a particular stimulus; thus 
sub-groups of nodes in an SOM learn to respond to certain input patterns. 
During the learning process nodes compete to be activated by the input, but 
it is only the node that minimizes the error that “wins” and has its weights 
adjusted. In this way nodes in an SOM learn to specialize, and are best suited 
to pattern recognition. Because nearby nodes usually react to similar patterns 
the SOM may be arranged in a 2D or 3D structure for visualization and in- 
terpretation of this distance. The initialization of weights during the learning 
algorithm is random so that initial reaction to input stimulus is different at 
each node. 
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In EEG classification ANN is more flexible than association rules in es- 
timating the map F(-). The cost of implementing ANN is relatively small. 
ANN is a non-parametric approach that assumes nothing about the underlying 
model and as such no selection of parameters outside those already discussed 
is necessary. The process can in many cases be largely automated, whereas 
the development of association rules is cumbersome. Nevertheless there are 
disadvantages and in EEG classification these include: 


e ANN is adaptive in that it can be retrained when the environment 
changes, making it suitable for non-stationary signals or for problems in 
which new data become available. However, this re-training process is 
not possible on-line and may need supervision by an ANN expert. In 
contrast association rules allow any user not familiar with the theory 
to change common thresholds so as to adjust performance [191]. Fur- 
thermore, adaptation does not guarantee that ANN performs better — 
in some cases it may in fact degrade performance. 


e ANN can be organized so as to reflect a chosen confidence level for its 
decision. This is important. 


e ANN can be trained to detect events which are not epileptic as well as 
those that are. This allows inclusion of artifactual data known to trigger 
false classifications so that the network can learn to reject them. It is 
more difficult to implement this using association rules. 


e ANN, like any other classifier, can only behave as well as its teacher. If 
incorrect or insufficient data are used during training the reaction of the 
network can be unpredictable when presented with previously unseen 
data. 


Error correcting for multi-layer feedforward networks and competitive learn- 
ing for SOM are the most commonly used for classification of EEG seizures. 
'Their relative merits are as follows: 


e Backpropagation is guaranteed to converge to a minimum of the cost 
function, although convergence may be to a local rather than a global 
minimum. Less is known about competitive learning for SOM networks, 
and convergence even to a local minimum cannot be guaranteed. 


e Backpropagation requires training data that is both epileptic and non- 
epileptic so that the ANN can learn to distinguish the two classes. SOM 
only requires examples of epileptic periods (and perhaps examples of 
typical artifact) so that neurons may learn these patterns. This avoids 
the problem of over-representation — the selection of the correct amount 
of data so that the relatively infrequent epileptic events are not under- 
represented during the training process. 
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e SOM can learn as many types of patterns as required, so long as there are 
enough nodes in the network. The selection of an appropriate number of 
nodes is a problem for both SOM and multi-layer networks. If too few 
nodes are used then the network is incapable of differentiating between 
complex patterns. If too many nodes are used then the network can 
potentially learn to react to noise in the data. This problem is known 
as over-parametrization [178, 190]. 


e Learning algorithms in ANN are computationally taxing, and conver- 
gence to a minimum may be particularly slow for backpropagating al- 
gorithms. 


ANN have been incorporated into classification systems designed to de- 
tect epileptic seizures as early as 1994 in [162]. Given their popularity and 
their abilities in pattern recognition it is not surprising that their use remains 
strong today. Early work including that in [137] bypassed the need for feature 
extraction, using the raw EEG as an input to the ANN. However extracting 
features is a way to compress the input space so that training of ANN is faster. 

SOM has been used by [45],[44] and [137] but it is multi-layer feedforward, 
typically with one hidden layer, that remain the most popular. These networks 
have been used for the classification of many EEG phenomena other than 
seizure activity. For example in [188] the system is used to recognize common 
EEG states such as sleep, alpha rhythms and artifact. 

A series of publications [8, 178, 175, 177, 176] have recently emerged com- 
paring the performance of different types of ANN, used for seizure detection, 
to each other as well as to common association rule methods. Spectral fea- 
tures were used for training. They conclude that feedforward ANN is the most 
suitable choice, although the results indicate that differences in performance 
are minimal and may be a consequence of the development process. SOM is 
not included in this analysis. 

Early versions of one of the more sophisticated algorithms in the literature, 
REVEAL, suggest the use of a probabilistic ANN (or PNN). The major advantage 
is that it supports incremental learning — a new type of seizure can be added 
to the training set without the need to retrain the entire network. This comes 
at a computational cost that is nevertheless acceptable given the alternative 
(complete re-training of multi-layer networks). A series of papers that describe 
this process are [192, 194, 191]. 


4.1.3 Support Vector Machines 


Support Vector Machines (SVM) are another type of machine learning suitable 
to estimate classification map F(-). SVMs are linear classifiers in that their 
aim is to find a hyperplane that separates classes of objects. The hyperplane 
that simultaneously minimizes error and maximizes the distance between the 
two classes forms the classification boundary. 

An example of two classes that are linearly separable is shown in Figure 
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FIGURE 4.3: SVM classification problems. (a) is a linearly separable exam- 
ple, (b) is an non-linearly separable problem which may be solved by allowing 
soft margins and (c) is a completely non-linearly separable case in which lin- 
ear separation may be achieved only through projection to higher dimensional 
space. (d) illustrates this, by showing how projecting a one dimensional space 
to two dimensions may make the classification linearly separable. 


4.3(a). Here the hyperplane of separation in two dimensions is a straight 
line between the black and gray circles. The optimal hyperplane is the line 
that maximizes the margin of separation — the distance between classes, also 
marked in Figure 4.3(a). 

Only a subset of the training data is used to compute the ‘optimal’ sepa- 
ration hyperplane. These data are known as support vectors, and they are the 
points that maximize the distance between the known classes. The SVM train- 
ing algorithm determines which data to use as support vectors, and then uses 
these to find the optimal hyperplane. Finding the support vectors consumes 
most of the processing time. The interested reader may find the descriptions 
in [64] useful. An implementation of the algorithm can be found in [173]. 

Sometimes it is not possible to linearly separate two classes, as shown in 
Figure 4.3(b) and (c). In (b) new training data that lie on the wrong side of 
the decision boundary are available, and the two classes are no longer linearly 
separable. In such cases the margin of separation is made soft, and the aim of 
the SVM algorithm is then to compute the hyperplane that allows some errors 
to occur whilst still maximizing separation between support vectors. This is 
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achieved by the introduction of a slack variable that measures the deviation 
from ideal separation. 

In Figure 4.3(c) slack variables do not help — the classes are in no way lin- 
early separable. In such cases the SVMs can still be used, thanks to Cover's 
theorem that states that non-linearly separable data may become linearly sep- 
arable with high probability if the input feature space is projected to (usually 
much) higher dimensions [64]. A simplistic and extremely unrealistic example 
that nonetheless illustrates this point can be seen in Figure 4.3(d). The left 
panel shows that in a single dimension no linear function can separate the 
two classes. However, imagine a transformation to two dimensions where a 
second dimension is introduced by assigning data the value of 0 and 1 in an 
alternating manner. This is shown in the right hand side of Figure 4.3(d). A 
line is now capable of separating the two classes. Cover's theorem does not 
say what an appropriate transformation should be, but several options exist, 
as discussed in [64]. 

SVM training algorithms are more generic than ANN in that their appli- 
cability is wider. For example, the back-propagation algorithm was designed 
to perform best for a particular type of network, whereas SVMs work for a 
very wide class of problems [64]. Furthermore, SVMs have several advantages 
over ANN for the purpose of seizure recognition: 


e Both ANN and SVMs aim to find an ‘optimal’ decision boundary be- 
tween classes, but whilst the former is computed by minimizing the 
classification error of the training set, SVMs maximize the distance be- 
tween the boundary and both classes. This distinction implies that 
SVMs perform better than ANN for examples unseen in the training set 
— a property well suited to EEG data that is noisy and varied in nature 
[170]. 


e The selection of a subset of data to use as support vectors means that 
the performance of SVMs is not affected by the over-representation of 
data in the training process, as is the case with ANN [170]. Given that so 
much more non-epileptic data exists the danger of over-characterization 
is strong and this is a particularly useful property for seizure recognition. 
Less care is needed in the selection of a training set. 


e Given identical training data SVMs always converge to the same answer, 
unlike ANN that may get stuck on local minima depending on (often 
random) initialization parameters [49]. 


e SVMs reduce the risk of over-parametrization or over-fitting through 
the use of the slack variable. This is not the case with ANN [49]. 


Nevertheless SVMs are not as popular as ANN in EEG seizure classifi- 
cation. Examples in which they have been used include the classification of 
PSD based features by [49] as early as 1999, and recently in [171] where the 
spectral features are now extracted by wavelet based methods. Testing on 
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larger EEG databases is necessary before the applicability of SVM to seizure 
detection can be determined. 


Whether through the use of association rules, ANN or SVM 
all classifiers are designed to estimate an optimal mapping func- 
tion F(-) between feature space and class boundaries. The dif- 
ferences between methods lie in how these decision boundaries 
are calculated, what assumptions are made about the underlying 
system and the computational complexity that is tolerated. 


Association rules are non-formalized and non-automated 
methods of finding F(-). This makes them highly flexible at 
the expense of shifting all complications back to the designer 
— appropriate models and assumptions must be derived. ANN 
and SVM are more structured methods in that these underlying 
assumptions already exist for a generic class of problems. This 
makes them less flexible in general, the advantage being that 
both ANN and SVM are non-parametric and require the choice 
of only a few parameters by the designer. Thus, unlike associa- 
tion rules, they can be largely automated to estimate a suitable 
F(-) that does not rely on assumptions about the nature of the 
data. 


4.2 Expert System 


This section discusses in more general terms what may be incorporated into 
a complete detection or prediction system to account for differences observed 
between and within EEG traces. 

Choosing the correct number and type of features to use in a classifier 
is part of the expert system. The process cannot be approached naively 
with a “more is better" approach because this is simply not true. Over- 
parametrization does not only make a problem computationally harder but 
also adds unwanted redundancy. /f a feature is to be added to the feature set it 
must provide new or at least partially independent information. Most features 
extracted from EEG, even if they employ different feature extraction methods, 
display some degree of correlation between them. In such cases a multivariate 
approach, where features are combined prior to any decision making, is likely 
to provide better results [75]. 

An EEG record is typically composed of many signals located on different 
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parts of the head. Features can be extracted from each signal z.[n] separately, 
where c = 1, 2..Cror is the signal number, or from groups of signals combined 
together. Signals can be classified independently, resulting in Cror separate 
decisions, or features can be combined prior to the classification process. In 
any case the location of a signal has impact on the decision and decisions 
should make use of this spatial context. 

There is also temporal context. Background activity can affect a decision, 
and expert systems should capitalize on this information. Temporal context 
is implicitly included in the feature extraction process through the selection 
of window sizes and their overlap, but questions such as how a classifier reacts 
to recent detections or how it reacts when a person is asleep or awake is 
information that can be added by the expert system. 

Finally performance can depend on environmental context, where algo- 
rithm parameters can be tuned to a particular case. For detection and predic- 
tion of epileptic seizures this is most often done by making the classification 
patient specific. Features that are particular to a person are targeted. Almost 
all published algorithms claim a certain degree of user-tuneability, although 
ANN and SVM based classifiers less so. 

It is a reasonable assumption that complicated problems require sophisti- 
cated solutions [98]. The long history of EEG seizure detection and prediction 
has shown that naive methods cannot be successful and creativity in the ex- 
pert system is required. For example signals other than EEG such as heart 
rate could provide additional useful information. 

'The remainder of this section expands on comments made thus far, giv- 
ing examples of appropriate combination of features in the decision making 
process (Section 4.2.1), the different types of spatial and temporal context 
(Section 4.2.2) and how to make algorithms patient specific (Section 4.2.3). 
The reader is advised that any combination of these is possible as a complete 
strategy for a classification system. 


4.2.1 Processing Decisions 


Decisions can be made for each EEG channel separately, at different points in 
time, with different degrees of certainty, etc. At the end of the day, however, 
a detector must give a yes/no/don't know answer. This is what is meant by 
processing of decisions. How is the information of all channels combined? Is 
this done at the feature extraction stage or at the classification stage? 
Complete strategies can involve the combination of many different classi- 
fiers. Association rules can be combined with ANN or SVM by applying them 
to features before or after the classification process. Figure 4.4 shows some 
possible configurations applied to features extracted independently from four 
EEG channels. The structure of (a) is as in [188], the structure of (b) is as in 
[171] (but with SVM), and the structure of (c) is as in [194]. Sometimes rules 
can be developed as a two stage detection process, one example being in [98] 
where features are first classified using association rules. As a first stage 7596 
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FIGURE 4.4: Example expert classifier combinations. In (a) each feature 
is passed, independently, to the same ANN. Association rules between these 
decisions are applied to obtain a final outcome. In (b) association rules are 
applied to the feature set first, the outcome of which is passed to a single 
ANN to make a decision. In (c) different ANN systems have been trained to 
recognize particular features. Decisions are made using all features at once for 
each “if/else” ANN rule, and association rules combine these as a final stage 
to obtain a decision. 
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of the data are efficiently rejected in this way. In the second stage detections 
are fine-tuned using more sophisticated classification, in this case ANN. 

A final example of how an expert system may differ from a simple feature 
and threshold scheme is the one in [176], where the expertise is now put into 
the training process. Many identical ANNs are trained simultaneously, but 
because they are initialized randomly they converge to different local minima. 
The training system then selects the neural network that performs the best 
through a form of competitive learning. An article that explains how this 
works is [19]. 


4.2.2 Spatio-Temporal Context 


An expert system can add context by deciding how a classification is made 
given (1) the patient, (2) the time and (3) the location. For a patient un- 
specific detector only (2) and (3), the spatio-temporal context, is relevant. 
Both temporal and spatial information are implicitly included in any classifi- 
cation system through the choice of windowing and through the combination 
of EEG channels. Here, more explicit methods to include spatio-temporal 
context are discussed. 

The simplest form of temporal context is to report a true detection only if 
consecutive detections over many windows are made. This is useful because 
it eliminates the mis-classification of short bursts. It is incorporated in most 
systems in one form or another, an example being in [50] that requires at least 
two detections in a row. 

Comparing to a background is another way by which temporal context can 
be added, for example a classifier can adapt its behavior dependent on whether 
the patient is asleep or awake, can ignore short burst by smoothing features 
over time or can use relative rather than absolute values wherever a seizure 
is expected to differ significantly from the background. Features can be made 
relative to short term background (a few seconds) or long term background 
(minutes to hours). Some detectors ([128, 127]) use median filters instead of 
averages to quantify background activity so as to avoid the interference of 
short, large amplitude bursts. Forgetting factors and variable window lengths 
can be used to place more importance on recent events by either omitting or 
attenuating data that are less recent [194]. The work in [55] was the first to 
incorporate background updating, using 20 seconds at a time and including a 
gap of 12 seconds between background and present so that slow onset seizures 
are not missed. 

Most detectors incorporate spatial context at the end of the classification 
process. A decision is first made for each channel independently and a true 
detection is reported only when more than a few channels agree. The trade-off 
is in the number of channels to use to avoid spurious detections versus missing 
focal seizures that only manifest on a few signals. This last is also the reason 
why channels are treated separately in the first place. How some contextual 
information may be included by selective grouping is discussed in Chapter 5. 
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In prediction of seizures spatial context is often a much more involved 
process, requiring the careful selection of subsets of channels, for example 
the synchronization work done in [112, 113] where the predictor measures 
differences in synchronization at different sites. In the extraction of maximal 
Lyapunov exponents [71] there is selective inclusion of channels based on how 
similarly they behave. The role of such multivariate methods remains largely 
unexplored for detection of seizures [75]. 


4.2.8 Patient Specificity 


Patient specific algorithms in EEG are suitable only when sufficient data are 
available. For the detection or prediction of seizures this means that a suf- 
ficient number of epileptic events must be recorded so that parameters can 
be tuned to a particular person. Often this is possible only for long-term 
monitoring where days rather than hours of EEG are collected [156]. 

Once sufficient seizure data are available, the expert system can be trained 
to detect (or predict) seizures only if they are stereotyped, that is, when the 
extracted features are consistently similar before or during the event. For- 
tunately for epilepsy the number of different types of seizures particular to 
a person is often small, and so long as an instance of each is present in the 
training process the classifier can be tuned. The algorithm can also be trained 
to recognize typical events that trigger false detections, often patient specific 
manifestations themselves [74]. 

Algorithms developed for patient un-specific cases can perform better when 
tuned to the patient specific case [45], but methods developed for patient 
specific cases cannot usually be generalized to be patient un-specific. Their 
performance is poor, as demonstrated in the work of [171], [139] and [140]. 

Patient specificity can be as simple as selecting channels known to be 
involved in seizures. For example focal seizures that never spread to the entire 
brain can be analyzed using only a subset of channels. This is particularly 
important for intra-cranial onset detectors when seizures begin in a very local 
and specific region. More sophisticated methods utilize spectral ([163, 191, 
192]) and temporal signatures ([139] and [140]) — patterns that are stereotyped 
and can be used to recognize events. All methods that employ the extraction 
of signatures are by necessity patient specific. 

Although the search for the perfect patient un-specific algorithm contin- 
ues for the detection of seizures, in prediction it is more or less accepted that 
classifiers must be customized to the individual [45]. Prediction of seizures 
is much more susceptible to the balance between the correct selection of fea- 
tures and spatio-temporal context than is detection. All methods in seizure 
prediction that have shown promise are patient specific. 
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4.3 Conclusions 


Designing a classifier for a complex system such as the epileptic EEG involves 
the stages shown in Figure 4.1. The features extracted using methods de- 
scribed in Chapter 3 must be combined, analyzed and classified in an expert 
manner. 


Given a feature set as an input, a classifier (association rules, 
ANN, SVM or otherwise) determines which class this feature 
set most likely belongs to. Machine learning algorithms such as 
those used for ANN or SVM help in standardizing and automat- 
ing the task of determining the decision rule. 


An expert system addresses the more difficult question: how do we make 
our classifier work best? 


A deep understanding of the problem is required to know 
what expert knowledge to incorporate into the overall detec- 
tion system. For epileptic seizure detection and prediction it 
involves adding contextual information that exploits the spatial 
and temporal aspects of the EEG. 


Detection, classifiers and expert system are not based on a true under- 
standing of the relationship between the epileptic brain behavior and the 
EEG. Instead these algorithms try to emulate human decision making using 
the same EEG records. This chapter together with Chapter 3 surveyed signal 
processing, classifiers and expert systems applied to the detection and predic- 
tion of seizures using EEG. The analysis presented and the references to the 
literature are not exhaustive — they point to the type of research conducted 
in the past few decades. Although hundreds of papers have been published in 
this area, it is fair to say that at this stage most are explorative. This is a tes- 
tament to the difficulties in the classification of the epileptic EEG, discussed 
next. 


5 


Seizure Detection 


Automated detection of epileptic seizures from EEG records is an old problem, 
and work continues because detectors are driven by technology. However to 
date little or no standardization exists in what ‘good’ performance or ‘good’ 
detection means. 

To our knowledge no comprehensive review of seizure detection algorithms 
where state of the art detection strategies are compared on the same dataset 
has been published. The aim of this chapter is to explore the difficulties of 
seizure detection as well as to provide a norm under which current and future 
seizure detectors can be validated. We work toward developing a systematic 
and consistent framework that defines standard performance metrics. 

Of most importance is the standardization of the data that is being used. 
An algorithm is developed on a training dataset and evaluated on a testing 
dataset. 


To validate an algorithm as a successful detector guidelines are 
proposed in [53]. The testing dataset must: 


1. Be different from the training dataset (to some extent). 
2. Contain many different types of seizures. 


3. Not be pre-selected in any way (e.g., based on seizure type) unless specif- 
ically required by the detector. 


A ‘sufficiently large’ database minimizes the risk that performance com- 
parisons between algorithms are biased, but is nevertheless an arbitrary and 
subjective requirement. It is not too difficult to get hold of thousands of gi- 
gabytes of data, but who controls how representative this data is? How is the 
bias of the professional marking the epileptic events accounted for? Compar- 
isons between algorithms tested on different ‘sufficiently large’ datasets may 
be unrepresentative. For this reason an additional guideline is proposed. 
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The comparison of performance between detectors should be 
standardized. This requires an independent evaluation and/or 
a common dataset used to test all algorithms. 


Whilst there are a few independent studies that evaluate the performance 
of algorithms (see [132, 194]), to date no mass testing of seizure detection 
algorithms on the same dataset exists. We attempt to remedy this in this 
chapter: several algorithms are tested and compared on the same dataset. 
The reported performance is of course relative rather than absolute. It is 
the hope that with appropriate ethics clearance this database may be made 
publicly available so that results here become benchmarks for future research. 
'This initiative would also make it possible for researchers to test their work. 

The metrics used to compare algorithms must also be made standard. The 
number of true detections (true positive rate (TPR)) may be as important as 
the number of false detections (false positive rate (FPR)). These, along with 
others, are discussed in more detail later. 


Clear and pre-defined metrics that compare the performance of 
an algorithm must be reported. These metrics must wherever 
possible be objective rather than subjective. 


The problem statement of seizure detection is discussed and formalized in 
Section 5.1. It includes details of the data and the metrics used throughout 
this chapter, both of which depend on the application. 

Once a standard dataset and a way in which to evaluate performance is 
defined, it is then possible to explore the difficulties in designing a successful 
detector. In this chapter three key aspects of a detector such as that shown 
in Figure 4.1 are identified and tested separately: 


1. Evaluation of Classification Methods: A detector bases its decision on 
a classification method, as presented in Section 4.1. The aim in this 
test is to determine the relative ability of ANN and SVM to classify 
epileptic EEG data using the same features as input to all classifiers. 
'This is discussed in Section 5.2. 


2. Evaluation of Patient Un-specific Seizure Detectors: Current patient 
un-specific seizure detection algorithms are available by the hundreds 
but are seldom tested on comparable datasets. The aim of this test is 
to determine the relative performance of leading seizure detectors, as 
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well as to provide a benchmark for future tests. Performance is evalu- 
ated based on the ability of a detector to identify any part of a seizure 
measured from scalp EEG records. This is discussed in Section 5.3. 


3. Evaluation of Onset Detectors: The ability to detect the beginning of 
a seizure opens the door for new technology aimed at aborting a seizure 
once it has begun. The aim of this test is to determine the relative 
performance of different features as onset detectors on the intra-cranial 
EEG. This is discussed in Section 5.4. 


'The details and rationale behind the selected tests are all outlined in the 
relevant sections. 


5.1 The Problem of Seizure Detection 


There are many reasons why neurophysiologists need seizures to be detected 
from EEG. Seizure detection is useful for diagnosis: long hours of EEG must 
be reviewed and classified. Finding seizures is an important aspect of this. 
Sometimes it is necessary that seizures be detected on-line so that the spread 
of the seizure can be monitored, sometimes by injecting a radiation dye (ra- 
dionuclide) at the time the seizure starts. It is important that the beginning 
of a seizure be identified, particularly for patients whose seizures are rare. 
Seizure detection is also useful for treatment: timely drug delivery, electri- 
cal stimulation and even predictors require seizures or their precursors to be 
detected reliably. 

In all cases the automation of the detection process saves many hours 
and in some cases makes treatment viable. It is not surprising that after 
decades of research seizure detection remains an important focus of research. 
'The motivation to improve detectors as well as develop new technology that 
target specific applications remains strong. 

Why is seizure detection so difficult? Part of the problem is indeed that 
seizures can manifest themselves in many ways. The features most commonly 
observed in the EEG, reiterated from Chapter 1, are 


e Oscillations: Each channel often becomes more oscillatory, in contrast 
to the examples of normal EEG shown in Figure 1.9. 


e Synchronization: 'The EEG channels behave more like each other and 
activity is global. 


e Large amplitude: After seizure onset the EEG is often much larger in 
amplitude than prior to a seizure. 


However it is important to remember that the task is made difficult because 
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e Not all seizures display all or any of these features. 


e Seizures can vary significantly between patients, within the same patient, 
within the same seizure at different times and within the same seizure 
on different channels. 


e Many non-epileptic phenomena including artifact often have features 
similar to seizures. 


So, coding a detector to target activity that ‘is different from background 
and not artifact! seems like a good idea. However, extracting any one feature, 
or a set of features, capable of discriminating between seizure and non-seizure 
EEG (let alone artifact) is difficult. 

This leads to yet another aspect that makes seizure detection difficult — 
even human experts often cannot agree om what is a seizure and what is not. It 
was shown that when four experts were asked to review the same EEG record, 
marked independently before by another expert, only 9296 of the events were 
also identified by one of the new experts, and just below 8096 were identified 
by two or more experts [193]. Even less agreement was evident as to the time 
of onset and termination of seizures. 

If human experts cannot agree it is difficult to interpret the results pre- 
sented by any published detector. For comparison to be fair these must be 
tested on the same database that has been marked by the same experts. This 
database may contain bias, after all they are marked by human beings, but 
by making this bias uniform to all algorithms then the relative result can be 
evaluated. 

In the remainder of this chapter several tests are conducted whereby the 
same database, marked by the same human experts, is tested with the same 
performance criteria. This EEG database is described next, followed by the 
definition of standard metrics used for evaluation. 


5.1.1 The EEG Database 


Four groups of EEG data are available for testing. All datasets are acquired 
using the Compumedics? M E-series EEG whose specifications are summarized 
in Table 5.1. The impedance of each channel is tested prior to recording — 
it must be high so that minimal current leaks through the electrodes. Each 
sample is recorded with noise levels guaranteed to below 24V. Hardware fil- 
ters are applied so that recorded signal lies between 0.15 — 105Hz prior to 
sampling at 256 or 512Hz. Additional filters can be applied post-acquisition. 
Figure 5.1(a) shows a typical scalp EEG recording setup, and (b) shows the 
dimensions of typical scalp, subdural and depth electrodes. More detailed 
specifications and usage instructions can be found in [30]. 

All scalp data was acquired using the international 10-20 electrode config- 
uration system shown in Figure 1.8. Twenty-one channels are recorded with 
reference to an electrode located near Cz and re-referenced post-acquisition. 
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No. of Channels 
ADC Resolution 
Sampling Rate 
Sampling Type 
Input Ranges 
Noise 


Electrode Offset 
Electrode Impedance 


177 


up to 64 

14 bits +1 LSB bit 

256 or 512Hz (Software Configurable) 
Sample and hold on all channels 

1, 2, 4, 8 and 16mVp-p 

2uVp-p maximum 

Guaranteed 0.16Hz-105Hz 

Up to +350mV allowable DC offset 
100kQ range 

Software allowable testing 


Filters 


High Pass 0.15Hz (Hardware) 

Low Pass 105Hz (Hardware) 

Notch 50Hz, 60Hz (Software) 
Additional software filters available 


Amplifier Accuracy 
CMMR 


+1% 
>110dB with 10€? imbalance 


TABLE 5.1: Compumedics E-series EEG specifications [30]. 
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FIGURE 5.1: A typical scalp EEG recording setup is shown in (a). Electrodes 
attached to the scalp feed to a computer for storage after amplification, filter- 
ing and sampling. In (b) are example dimensions of typical scalp, subdural 


and depth electrodes. 
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The electrode configuration of intra-cranial data was specific to the subject’s 
condition. 

All scalp records were marked by professional electroencephalographers 
from St Vincent’s Hospital, Melbourne, Australia. To reduce bias, bursts 
shorter than 20 seconds are not used for evaluation. To reduce bias from 
patients with frequent seizures only the first 10 are included in testing. All 
EEG data after the first 10 seizures is discarded. There is no other form of 
pre-selection prior to testing other than to ensure that seizures were electro- 
graphically identifiable. 

'The details of each dataset are outlined next and summarized in Table 
5.2. Which datasets and how they are used for each evaluation is discussed in 
more detail in the relevant section. Wherever possible all data is normalized 
using Equation 3.13. 


5.1.1.1 Group 1 - Scalp EEG Data (< 6 Seizures per Patient) 


This database contains scalp-recorded EEG from 15 patients. Between 1-5 
seizures are recorded for each patient. A total of 41 seizures are identified in 
361 hours of EEG recording. 

The data in this group are used in the evaluation of patient un-specific 
seizure detectors (Section 5.3). When acquiring scalp EEG data it is not 
unusual to have some corrupted recordings due to, for example, improper 
attachment of electrodes. Where appropriate, performance is also evaluated 
when channels with excessive amounts of artifact are removed from analysis. 
These results are labeled as ‘good channels only’. 


5.1.1.2 Group 2 — Scalp EEG Data (6 — 10 Seizures per Patient) 


This database contains scalp-recorded EEG from 6 epileptic patients. Between 
6-10 seizures are recorded for each patient. A total of 51 seizures and 165 hours 
of EEG are available. 

The data in this group are also used in the evaluation of patient un-specific 
seizure detectors but is differentiated from Group 1 because of the large num- 
ber of seizures that each patient experiences. This makes it ideal for patient 
specific applications that require a separable training and testing dataset. It is 
used in the evaluation of classification methods (Section 5.2). As with Group 
1 (and where appropriate), results are compared to when only good channels 
are used. 


5.1.1.3 Group 3 — Scalp EEG Data, Non-Epileptic Patients 


This database contains scalp-recorded EEG from 8 patients that are not 
epileptic. Only 11.5 hours of EEG records exist because routine monitoring 
of non-epileptic subjects is more rare in hospitals. 

The data in this group are useful to complement the performance evalua- 
tion of patient un-specific seizure detectors (Section 5.3) because they provide 
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Patient Seizure Mean Sz Record Time No. EEG Sampling 
H X Length (secs) (hours) Channels Rate (Hz) 
Group 1 — Scalp EEG data, « 6 seizures per patient. 
Number of patients — 15, Seizures — 41, Total testing time — 361.9 hours 


1 3 185 6.7 32 512 
2 4 349 32.3 32 512 
3 2 268 25.2 32 512 
4 3 140 24.1 32 512 
5 1 66 6.9 32 512 
6 4 58 15.8 32 512 
T 3 149 65.5 32 512 
8 3 80 31.3 32 512 
9 5 74 40.8 32 512 
10 3 109 41.9 32 512 
11 1 109 17 32 512 
12 1 99 4.5 32 512 
13 5 48 37.5 32 512 
14 1 92 7.3 32 512 
15 2 74 5 32 512 
Group 2 — Scalp EEG data, 6 — 10 seizures per patient. 
Number of patients = 6, Seizures = 51, Total testing time = 165.6 hours 
1 7 65 20.6 32 512 
2 10 529 16.6 32 512 
3 10 135 14.4 32 512 
4 6 22 30.3 32 512 
5 8 24 58.4 32 512 
6 10 52 25.4 32 512 
Group 3 — Scalp EEG data, Non-epileptic Patients. 
Number of patients = 8, Total testing time = 11.4 hours 
1 n/a n/a 0.7 32 256 
2 n/a n/a 1.6 32 256 
3 n/a n/a 1.8 32 256 
4 n/a n/a 2.2 32 256 
5 n/a n/a 1.1 32 256 
6 n/a n/a 0.3 32 256 
7 n/a n/a 0.4 32 256 
8 n/a n/a 3.1 32 256 


Group 4 — Intracranial EEG data 
Number of patients = 3, Seizures = 16, Total testing time = 96.5 hours 


1 6 123 23.5 6 512 
2 5 82 23.8 6 512 
3 5 199 49.2 6 512 


TABLE 5.2: Summary of available EEG data divided into testing groups. 


a comparison of how performance is affected by the presence of epileptic bursts 
between seizures. Whilst 11.5 hours is small in comparison to other databases 
it is used only as complementary information until more data can be gathered. 


5.1.1.4 Group 4 - Intra-Cranial EEG Data 


'This database consists of intra-cranially recorded EEG from 3 epileptic pa- 
tients being monitored for surgical resection. From this database a total 
of 16 clinical seizures and 96.5 hours of EEG are used here. Sub-clinical 
seizures are also marked but not used in the analysis. The database was gath- 
ered, marked and made available by the Freiburg Seizure Prediction Con- 
test (For details, see  https://epilepsy.uni-freiburg.de/freiburg-seizure 
-prediction-project/). 
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Time 


ONSET DELAY 


FIGURE 5.2: Above is a representation of what a true positive (TP) and a 
false positive (FP) are. If a detection occurs during the seizure, it is a true 
positive. If a detection is made outside this time, then it is a false positive. 
Also marked is the detection onset delay. 


'The configuration and number of electrodes for each patient differed. How- 
ever in all cases only six signals, bipolar referenced, that are most obviously 
involved in seizure activity as judged by visual inspection are used. 

Both the electrographic and clinical onsets are marked, making this 
database useful for testing of onset detectors described in Section 5.4. The 
length of time between electrographic and clinical manifestation was, on av- 
erage, 27 seconds for Patient 1, 2 seconds for Patient 2 and 23 seconds for 
Patient 3. 


5.1.2 Performance Evaluation Metrics 


Ideally a metric used to evaluate the performance of a detector should be 
objective. In reality subjectivity cannot be avoided because the EEG records 
must be marked by a human expert with his/her own individual bias. This 
can be minimized when relative rather than absolute performance is reported, 
as is the case when a common dataset is used. The manner in which seizure 
detections are reported is formalized prior to testing so as to increase compa- 
rability between tests. 


A positive detection is reported when flagged by the algorithm. 
Detections within 60 seconds of each other are grouped so that 
continuous bursts of positive detections are not over-represented 


A true positive (TP) is reported when a positive detection 
occurs within the time marked as a seizure by a human expert. 
Due to ambiguous onset /offset times, a TP is also reported when 
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a positive detection occurs within 60 seconds of onset and offset 
of a marked seizure. Only one true positive is reported per 
seizure. A seizure that goes undetected is a false negative (FN). 


A false positive (FP) is reported when a positive detection 
occurs outside 3 minutes before and after seizure. This 3 
minute period is omitted so that epileptic activities frequently 
observed to lead up to a seizure are not counted as FPs. Only 
one FP is reported for every 60 seconds of continuous FPs. 


Combining all these, it is only possible to make a maximum of one (true or 
false) detection per minute. Example true and false detections can be found 
in Figure 5.2. 

Once TPs and FPs have been identified, the metrics that describe the 
performance of an algorithm can be formalized. These include!: 


e True Positive Rate (TPR) and Sensitivity: 'This metric gives the prob- 
ability that a seizure is correctly identified. Formally it is defined as 


Total number of TPs 
— . .1 
TS 'Total number of seizures oy 


Sensitivity and TPR are equivalent measures. 


e False Positive Rate (FPR) and Specificity (S): The FPR is the number 
of expected false positives that occur per hour of non-epileptic EEG. 
Specificity is a similar measure that describes the proportion of record 
that is marked. Many definitions exist, but here it is defined as the 
fraction of non-seizure time that is incorrectly marked as a seizure. 


Total FP time 


= " 5.2 
Total non-seizure time (52) 


S and FPR provide similar information but are not the same. 


e Onset Delay: The onset delay is the time it takes for a detector to 
identify a seizure after its electrographic onset. Electrographic onset, 
in turn, is defined as the first time that at least 2 channels are visibly 
involved in the seizure activity. This is more subjective than other met- 
rics because it depends heavily on how a human expert selects an often 
unclear point in time. 


1 Many different definitions of the metrics specified here exist. Although they are usually 
similar the differences are sufficient to make comparing results difficult. In this chapter the 
definitions important to the problem of seizure detection are formalized, again so that the 
problem can be standardized. 
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High TPR Low FPHR/S Short Onset Delay 


Clinical diagnosis Moderate Moderate Very low 
Radiation injection Moderate Moderate Very high 
Seizure predictor High Moderate Very high 


TABLE 5.3: Relative importance of performance based on application. 


TPR, FPR and S each give important information about the performance 
of an algorithm. Whilst TPR and FPR are sufficient to give an electroen- 
cephalographer a measure of performance, specificity S gives an additional 
idea of the ability of an algorithm to discriminate by estimating the propor- 
tion of time that a detector is flagging seizures. For example a detector that 
marks 90% of the record as seizure can have a misleading TPR of 100%, but 
looking at an FPR of 54, although very high, does not intuitively provide this 
information. There is redundancy in reporting both FPR and $ but for review 
purposes one may be more instinctive to interpret than the other. Since they 
are not difficult to calculate it does not hurt to have both. 

The computed TPR, FPR and S must not be not over-influenced by any 
one patient. True detections have been limited by allowing a maximum of 10 
seizures per patient (see description of EEG data) to reduce bias. For example, 
it is not uncommon for a single EEG record to contain 20-30 seizures. In a 
database of 100 seizures this accounts for 2596 of occurrences. If these seizures 
are particularly easy (or hard) to detect, results are not truly representative on 
how well seizures are detected across patients, particularly if the application 
of interest is patient un-specific. In a similar way some patients exhibit an 
excessive number of false detections. In all cases the FPR is capped at 15 
per hour. A report of how many patients exceeded this FPR is given where 
appropriate. 

Another consideration is that many epileptic bursts occur throughout an 
EEG record even though these do not develop as seizures. Whilst detection of 
these are not TPs, they are also not FPs. The issue is dealt with by including 
results using the Group 3 database, where all records are taken from non- 
epileptic patients. Since no part of these records are seizure related they give 
a better estimate of what the true FPR is. This eliminates the need to specify 
whether a false detection is ‘interesting’ or not as was done in some studies 
including [51] and [156]. This strategy is only possible for scalp records for 
ethical reasons. 


The true positive rate ((TPR), false positive rate (FPR), speci- 
ficity (S) and where appropriate the onset delay are reported as 
metrics that can be used to compare the performance of each 
algorithm. 
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To avoid over-representation of any one patient its TPR and 
FPR is limited to 10 seizures per person and 15 false detections 
per hour. This is particularly important for patient un-specific 
tests. 


FPRs are reported for both the epileptic records as well as 
for non-epileptic records so that bursts of epileptic activity that 
are neither TPs or FPs do not bias results. Specificity is only 
listed for epileptic records to give an idea of the proportion of 
EEG that can be discarded by the detector. 


The interpretation of the performance depends on the application because 
of the relative importance of each metric. For example a short onset delay is 
important for a system designed to stop a seizure once it has started, but not 
relevant for a clinical review of EEG. Table 5.3 shows the relative importance 
of performance for the applications listed at the beginning of this section. 


In [192] an example combined metric P that weighs the relative perfor- 
mance of a patient un-specific seizure detector is presented: 


max[0, S — S 


maa] VTPR x (1 — logio min[FPR maz, max[FPR min, FPR]] ), 
(5.3) 


where FPRmin, FPRmaz, Smag are the minimum or maximum values 
that are tolerated from each metric, defined by the user and depending on the 
application. This looks like a complicated formula but is only an example of 
how to weigh the importance of each metric, for instance if P is zero when the 
FPR or S exceed a maximum allowable level. P can be increased proportional 
to an increasing TPR. 

Other formulae for P can be conceived. This may require inclusion of 
onset delay, or weigh the metrics in a different manner. Only the metrics are 
reported here. 


As a summary of the performance of a detector a receiver operating char- 
acteristic (ROC) curve is reported, where possible. The TPR is plotted as 
a function of FPR when a detection parameter is altered. An example is in 
Figure 5.3. It shows the trade-off between true detections and false detec- 
tions because an increase in true detections is usually followed by an increase 
in false detections. In general the closer the ROC is to the top left corner 
(TPR=100%, FPR=0) and the further away it is from a random detection 
rate (dotted line) the better the performance of the detector. 
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FIGURE 5.3: A sample receiver operating characteristic (ROC) curve. An 
ROC is a graphical representation of performance where the TPR is plotted 
versus the FPR when a detection parameter is varied. The closer the ROC 
curve is to the top left corner (TPR=100%, FPR=0) and the further it is to 
the random detection rate the better the performance. 


5.2 Evaluation of Classification Methods 


The job of a classifier in a seizure detector is to label features extracted from 
data as belonging to either seizure or non-seizure activity. Three different 
classes have been described in Section 4.1: association rules, artificial neural 
networks (ANN) and support vector machines (SVM). ANN and SVM have 
been introduced as an alternative to association rules because they largely 
automate the identification of a decision boundary, in particular simplifying 
the task when multi-dimensional feature vectors are involved. 

This section is dedicated to the evaluation of the performance of ANN 
versus SVM for the classification of the epileptic EEG. In the literature many 
detection algorithms employ either ANN or SVM with no justification as to the 
choice, other than to propose that they are a better alternative to association 
rules. There are few references (e.g., [188, 190, 156]) providing numerical 
evidence as to whether SVM or ANN is suited to pattern recognition in the 
EEG, and even fewer try to distinguish between them. 

The aim of this evaluation is to determine which type of classifier, when 
trained and tested on the same data, is best capable of correctly identifying 
characteristic behavior in the EEG. A patient specific test ensures that the 
high variability of observables between patients does not skew results. At the 
same time the (arguably smaller) variability found within the same patient 
determines which of the classifiers can generalize their results to similar but 
previously unobserved examples. 

Only Group 2 data are used for this evaluation because each record con- 
tains many seizures, making patient specific training possible. For consistency 
between patients only the first 6 seizures in each are considered. Scalp rather 
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than intra-cranial EEG is used because the noisier environment makes sepa- 
rability between seizure and non-seizure activity much more difficult. 


5.2.1 Feature Extraction 


This evaluation is concerned with relative rather than absolute performance of 
each classifier. Whilst the feature extraction process is key in the performance 
of a detector here it is only necessary that they be capable of distinguishing 
between typical seizure and non-seizure activity — it is not essential that they 
be optimal. Frequency analysis is used because of its well demonstrated ability 
to differentiate between seizure and non-seizure EEG. 

Specifically, 2 second non-overlapping windows of single channel EEG data 
are filtered into frequency bands of 2-4Hz, 4-8Hz, 8-16Hz and 16-32Hz to keep 
in line with dyadic-style distributions used in wavelet analysis. These fre- 
quency ranges include all frequencies most commonly associated with epileptic 
seizures whilst excluding higher frequencies often associated with artifact. 

The mean signal power (squared value) after filtering is calculated for each 
frequency band. Each value is normalized by the total power, calculated as the 
sum over all bands. The resultant feature vector consists of four normalized 
energy values for each of the Cror channels. One epoch is characterized by 
the features in all channels and the classifiers are trained and tested on this 
4 x Cror dimensional feature vector. 


5.2.2 ANN Training and Testing 


The performance of many types of ANN can degrade when too much data 
of one type is presented to it during training. This is not ideal because in 
seizure detection at least 95% of recorded data is non-seizure. It follows that 
performance depends on the data selection [52]. Extensive testing revealed 
that too often the ANN training did not converge when both seizure and non- 
seizure data are used for training, and thus could not be compared to that of 
SVM. 

For this reason this evaluation utilizes a self organizing map (SOM). This 
is a type of ANN that maps itself to the topology of a given training set, 
thus can be trained using only seizure data. See Section 4.1.2 for a more 
detailed description of SOM, and Figure 4.2 for some examples of ANN node 
configuration. All data within a seizures in the training set is used to train 
the SOM. 

It is important that the number of nodes in the network is large enough 
to describe the topology of the feature vector whilst small enough to avoid 
over-fitting. Many network configurations with different number of nodes are 
tried here for each patient, and the one that performs best is kept. Typically 
this optimal performance was obtained with 5 to 6 nodes. 

Testing consists of presenting an input feature vector to each node in the 
network and measuring the distance between them. In theory if the input is 
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FIGURE 5.4: Individual patient performance on Group 2 data. In each plot 
results are shown when 1, 2 and 3 seizures are used for training. These are 
compared to random detection rates, calculated by performance rates when 
seizures are placed at random times along the EEG, averaged over 1000 trials. 
In patients 1, 2 and 3 both ANN and SVM are capable of 100% TPR with near 
zero FPR. Patients 4, 5 and 6 have less stereotyped seizure activity. Detection 
rates are comparable in Patient 4 for both ANN and SVM, but performance 
is not far from random. SVM is capable of higher TPRs than ANN in both 
Patient 5 and 6. ANN performs no better than random guessing in Patient 6. 
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FIGURE 5.4: (Continued) 
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FIGURE 5.5: Average performance on Group 2 data, where the first 3 seizures 
are used for training and the next 3 used for testing. SVM is shown to out- 
perform ANN when detecting seizures it has been trained on, whilst achieving 
similar performance to ANN when presented with unseen EEG data. 


close to a particular node then its features are very similar to some subset of 
the training sequence. The distance between the input feature vector and the 
closest node is stored for thresholding. 


5.2.8 SVM Training and Testing 


The SVM can be trained on both seizure and non-seizure data because, unlike 
ANN, the process is not overly affected by over-representation of non-seizure 
data. Again all seizure data in the training set is used to train the SVM. 
However to reduce the size of the training set only one third of non-seizure 
EEG is used. This did not affect results but did speed up the training time. 

Because of the complexity of the EEG it is unlikely that seizure and non- 
seizure data are linearly separable. A non-linear projection, as described in 
Section 4.1.3, is used. Several projection functions were tested but ultimately 
a Gaussian radial basis proved best. This is of course not exhaustive testing of 
SVM but proved sufficient for this investigation, and the question as to which 
projection function works best is not considered here. 

The training of the SVM used the same features as the ANN described 
earlier. That is, à 4x Cror dimensional feature vector consisting of frequency 
information at different scales for each of the Cror channels is presented to 
the SVM. 

Testing consists of presenting an input feature vector of the same dimension 
to the SV, and thresholding the output directly. 


5.2.4 Results and Comparisons 


Both SVM and ANN are tested using the first 6 seizures in Group 2 data. 
In both cases all reasonable efforts are made to optimize performance within 
each patient. In the case of ANN several different configurations, including 
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alterations to the number, type and layout of nodes, have been studied. Only 
the best results are reported here. 

ROC curves (described in Section 5.1.2) summarizing results for each pa- 
tient are shown in Figure 5.4 for both ANN and SVM. Each plot shows the 
performance of the classifier when 1, 2 and 3 seizures are used during training. 
Also shown are the expected detection rates when the seizures are placed in 
random locations along the EEG record, averaged over 1000 trials. These are 
used to give an indication as to how much better than random each of the 
classifiers perform, given the distribution of the particular EEG record. Recall 
that the closer the ROC curves are to the top left corner, and the further they 
are from random, the better the performance. 

Patients 1, 2 and 3 are examples in which 10096 TPR with near 0 false pos- 
itives are achievable, both with ANN and SVM. These patients have seizures 
whose features are both stereotyped and differ significantly from the non- 
seizure EEG. Both algorithms are capable of differentiating the two. In Pa- 
tient 3 SVM struggles a little more than ANN when only 1 seizure is used for 
training, but catches up in performance to ANN once more seizures are used. 

Separating seizure and non-seizure activity in Patient 4 is more difficult. 
The ROC curves for both ANN and SVM is much closer to random than in 
Patients 1-3. The performance of ANN and SVM is comparable. 

SVM outperforms ANN in both Patient 5 and 6. In Patient 5 SVM is 
capable of reaching a TPR of 10096 as compared to 8096 with ANN, albeit 
this occurs at high FPRs?. In Patient 6 ANN performs no better than random, 
whilst SVM is again capable of 10096 detection rate when more than 1 seizure 
is used for training. 

In Figure 5.5 a summary of the performance averaged over all patients 
is presented. In this figure 3 seizures have been used for training and 3 for 
testing. SVM is very good at detecting seizures it has been trained on, with 
> 90% TPR achievable for FPRs as low as 4. This TPR is only achievable 
for ANN with FPRs as high as 15. For the testing set, both ANN and SVM 
perform comparably for reasonably low FPRs. This suggests that SVM is su- 
perior to ANN at recognizing features it has been trained on, whilst achieving 
comparable results to ANN with previously unseen data. 

Arguably although SVM does better in more of the patients both ANN 
and SVM perform reasonably similarly in most cases. However from a more 
subjective point of view the implementation and training of SVM requires 
many less design considerations, is much faster to train and is less dependent 
on the training examples than is ANN. The small quantitative advantage that 
SVM has on ANN is magnified by these qualitative observations. 


?In all cases high FPRs are expected because little effort has been made to optimize the 
features used for evaluation. Again, this is because relative rather than best performance 
is under review. 
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In summary, the relative ability of ANN and SVM to discrimi- 
nate seizure from non-seizure EEG in a noisy environment has 
been evaluated. 


Quantitatively SVM outperforms ANN, albeit in many cases 
the performance is comparable. SVM seems to be particularly 
apt at identifying features it has been trained on, whilst still 
maintaining a reasonably good ability to identify unseen data. 
This is likely so because SVM is trained on both seizure and 
non-seizure data, whereas only seizure EEG is used to train the 
SOM. Quantitative results can be seen in Figure 5.4 and Figure 
5.5. 


Qualitatively SVM is both easier to implement and less prone 
to the idiosyncrasies of the data. This admittedly could be the 
reason that SVM outperforms ANN - the best configuration for 
ANN was not found — but does not change the fact that good 
results with SVM are easier to achieve. 


5.3 Evaluation of Patient Un-Specific Seizure Detectors 


A typical EEG machine is capable of recording around 64 channels concur- 
rently, generating gigabytes of information a day. Given that 95-99% of this 
data for epileptic patients appears useless for diagnosis [52] the manual review 
process is not only tedious, time consuming and expensive, but also prone to 
human error. 

In this section a review of current published software-based automated 
patient un-specific seizure detectors that use scalp EEG data is conducted?. 
It is an evaluation of the detection algorithms used for review of EEG data, 
where the requirement is that the detector identifies a seizure at least once 
during its progress, be this at the beginning, middle or end. 

Since standardization is the purpose of this study, although many hundreds 
of algorithms exist only a subset of the most developed were implemented. The 
selection was based on several criteria, including 


e Non-specific epilepsies: This is an evaluation of a broad range of seizure 


?'This review does not look at spike detection even though it is also an important aspect 
in the diagnosis of epilepsy. Spike detection is a problem in its own with very different 
considerations than those of seizure detection. 
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types. It specifically excludes work for which performance was only 
reported on absence seizures, such as that in [178], [175], [177] and [8]. 


e Volume and type of EEG database: Only work validated through test- 
ing on a large database is used. Furthermore preference is given to al- 
gorithms that make a clear distinction between the training and testing 
datasets. Preference is also given to those whose database is recorded 
in the international 10-20 electrode configuration system. 


e Adult EEG: 'The current database described in Section 5.1.1 does not 
include neonate data which differs significantly from adult EEG available 
here [52]. 


e Normalized feature extraction: Preference is given to algorithms whose 
features are normalized. This avoids implementation of algorithms that 
depend on the specifics of the EEG measurement system. 


Algorithms that are not implemented because they do not comply to these 
criteria may be compared to the benchmarks presented here once the database 
becomes publicly available. 

Of the algorithms selected only the latest published implementation is 
used, under the assumption that these improve on previous results. AIl ef- 
forts are made so that our work here remains true to the published material, 
including re-sampling and filtering of the database as well as selection of EEG 
channels that coincide with those used. Unlike Section 5.2 this specifically 
excludes the optimization of results. 

A total of 4 detection systems are implemented, three of which are com- 
mercially available solutions. Each algorithm is tested on Group 1, Group 2 
and Group 3 databases. Each algorithm and its results is summarized indi- 
vidually next, in chronological order of publishing. 


5.3.1 Algorithm 1: Monitor 
5.3.1.1 Algorithm Description 


Monitor is a commercially available detector whose origins begin in the early 
1980s. Its performance has been shown inferior to many modern detectors, 
but due to its historical role in the problem of seizure detection it is included 
in this analysis as reference. 

The algorithm is described in detail in [50] and [51] and summarized in 
Figure 5.6. Readers may also find it useful to refer to relevant material in [55], 
[52], [54], [139], [140] and [141]. Monitor is a multi-feature rule based classifier. 
The three normalized features are extracted from each EEG channel after it 
is de-composed into segments (see [54]). These features include the relative 
amplitude of a current epoch to past averages (RA), a coefficient of variation 
that measures the variability in a signal (COV) and the average duration of 
these segments relative to the background. The expert system incorporates 


192 Epileptic Seizures and the EEG 


Data Available 
(2 second epoch, 
no overlap) 


PREPROCESSING 


FEATURE EXTRACTION 


EXPERT SYSTEM 


LI L] 
LI L] 
LI LI 
t Preprocessing . 
1. (Bipolar referencing, H 
1 Downsample to 200Hz) * 

L] 
' H 
a LI 


M acl TOS T Seizure perceived in this 
channel. Label 1. 


More Channels to 
Process? 


No seizure perceived in 


this channel. Label 0. 


YES, Reject as artifact 


AAF > 1.6 AAB 


Amplitude too large? 


Decompose this channel's epoch, 16 second 
background (12 second gap), and 8 second 


future into sections and half-waves (*). 


NO 


YES, Reject as artifact 25 ms < ADE < 150ms 


segments? 


AE > AAB) AND 
(ADE < ADB) 


Feature Extraction for the half-waves: 


1. Compute average amplitude of epoch (AAE) No seizure 
2. Compute average amplitude of background (AAB) 

3. Compute average duration of epoch (ADE) detected 
4. Compute average duration of background (ADB) AAE > k1 AAB 
5. Compute square of coefficient of variation of epoch 

(COV) . ; 
6. Compute average amplitude of 8 second future Seizure is 
(AAF) detected 


7. Compute average amplitude for next epoch (AAN) 


ANN > 0.8 AAE 
for at least one 
channel? 


At least 2 channel with 


each of current and next Label 1 in epoch? 


epoch? YES 


FIGURE 5.6: Monitor detection algorithm summary. In the latest published 
original implementation k1 = 3 and k2 = 0.36. The ROC curves in Figure 5.7 
are calculated by varying k1 (the value of k2 did not alter results significantly). 
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FIGURE 5.7: Monitor ROC, generated by varying k1 in Figure 5.6. A max- 
imum TPR of 84.3%is achievable only when the FPR is as high as 11.9per 
hour. A more moderate FPR of 8reduces TPR to 60%. However these cal- 
culations included 2 patients for whom FPR was capped at a maximum of 15 
per hour, and results are expected to be affected by this. The FPR in Group 
3 is somewhat higher but overall not inconsistent with the FPR in Groups 1 
and 2. 


limited temporal and spatial contextual information by requiring detections 
to occur over more than one channel or more than one epoch. 


5.3.1.2 Results 


The algorithm was implemented verbatim from [50] and [51], using MATLAB 
version 7.1 and run on a standard 1400Hz Intel processor running Microsoft 
Windows XP operating system. Code can be made available upon request. 
Since this is a pure thresholding technique no training was required and the 
algorithm was applied to all available EEG data. Prior to analysis data were 
pre-filtered and downsampled to 200Hz to coincide with the specifics of the 
dataset used in the publications. 

The results are summarized in Figure 5.7. The ROC curves are generated 
by varying k1 shown in Figure 5.6, responsible for thresholding RA = E 
k2 was also varied but results are less significantly affected by this parameter. 

Monitor has been evaluated many times with results that vary wildly 
dependent on the database. Originally a TPR of roughly 7396 with an FPR 
of = 1 per hour is reported in [51]. Since then, TPR of 74.4% with FPR of 
3.02 per hour in [44], TPR of 31% with FPR of 0.1 per hour in [194] and TPR 
of 4396 with FPR of 0.78 per hour in [156] are reported. Here a maximum 
TPR of 84.3%is achievable only when the FPR is as high as 11.9 per hour. 
A more moderate FPR of 8reduces TPR to 6096. It is difficult to know 
which results to believe because little context about each implementation is 
known. Results here can be used as reference since all algorithms are tested 
on a common dataset. 

'The discrepancies are most likely explained by the differences in recording 
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equipment. The data used in [51] are recorded with an EEG machine whose 
sampling rate is much lower. Although data here are down-sampled to coincide 
with the published results the waveforms could look significantly different 
because of other differences in the hardware (e.g., anti-aliasing filters, noise 
levels, etc). Alternatively, the recording environment for data included here 
may contain significantly higher levels of noise. 

When Group 3 data are used the FPR on average increases, although not 
by much. Group 3 database is not large enough to be conclusive, but results 
indicate that initial calculations of FPR using databases taken from epileptic 
subjects are reasonably accurate. 


5.83.2 Algorithm 2: CNet 
5.3.2.1 Algorithm Description 


CNet is one of the first algorithms to use ANN. The detection algorithm is 
described in detail in [44] and [45] and summarized in Figure 5.8. It is a two- 
stage approach in which an initial evaluation is applied to groups of channels 
averaged together. If a preliminary detection occurs in any one group of 
channels further evidence from individual channels is gathered before a true 
detection is reported. 

Spectral information from both averaged as well as single EEG channels is 
extracted by computing spectrograms of 4-second, zero overlap EEG epochs. 
Prior to computation each epoch is first filtered using a matched filter that 
selectively attenuates frequencies least prevalent in seizure train data. A 
2DFFT* is applied to the spectrogram so that spectral peaks are emphasized. 
The distance of this feature vector to each node is presented to a trained SOM 
(see Section 4.1.2 for discussion on ANN and SOM). This distance (LDE) is 
normalized by a 30 minute moving background (LDB) before thresholding. 

The SOM is trained on pre-selected epileptic examples (in the order of 
100 4-second epochs) chosen to demonstrate a wide variety of phenomena. 
More than one epoch from a single seizure may be used. An advantage of 
SOM is that no non-seizure data are necessary and thus over-representation 
of non-seizure data is not a problem. 

Other than ANN training the expert system consists of epoch rejection 
based on relative EEG amplitudes between current, past and future epochs. 
Temporal information is incorporated by requiring detections to occur at least 
twice in 15 seconds, but the only form of spatial contextual information is in 
the manner that groups are formed in the preliminary detection stage. 


4A 2DFFT (2 dimensional FFT) is computed by taking the FFT of each row of a matrix, 
then the FFT of each column of the result. More details can be found in [44] and [45]. 
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Data Available 
(4 second epoch, 
no overlap) 


i PREPROCESSING 


FEATURE EXTRACTION 


| 
[ ] EXPERT SYSTEM 


Preprocessing 
(Bipolar referencing, 
Downsample to 128Hz, 
0.5-40Hz filtering) 


Initial analysis: 

1. Separate all channels into G=2 groups, same side of the head. 
2. Compute average of EEG for each group 

3. Compute average amplitude every 4 seconds for background 
window of 3 minutes (AAB) 

4. Compute average amplitude for current 4 second epoch (AAE) 
5. Compute average amplitude for next 4 second epoch (AAN) 


YES, Reject Epoch 
(AAE < 54V) 


No seizure (AAE bod AAB) Seizure is 
detected OR detected 
(AAE > AAN) 


For each EEG group: 

1. Compute spectrogram and 2DFFT. Test on each node of trained 
ANN (6 node SOM) for both current and background time series. 
2. Compute average lowest log distance for 30 minute moving 
background (LDB). 

3. Compute lowest log distance for current epoch (LDE). 


LDE < k LDB 


Any detections in the past 
16 seconds? 


YES, Now process each 
channel individually 


NO 
More Channels to 


Process? 


Compute spectrogram and 2DFFT for this Channel recognised by at 
channel. Test on each node of ANN. least N=2 nodes? 


YES 


FIGURE 5.8: CNet detection algorithm summary. The ROC curves in Figure 
5.9 are calculated by varying k. In the latest published original implementa- 
tion k = 0.8. 
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FIGURE 5.9: CNet ROC, generated by varying k in Figure 5.8. For Group 1 
and 2 data a maximum TPR of 85.4%is achieved coinciding with an FPR of 
4.4per hour. No patients exceeded an FPR of 15 for these calculations. The 
FPR in Group 3 are again slightly higher. 


5.3.2.2 Results 


The algorithm was implemented using MATLAB version 7.1 and run on a 
standard 1400Hz Intel processor running Microsoft Windows XP operating 
system. Code can be made available upon request. Since the original trained 
ANNs were not available, performance was estimated by averaging 10 trials. 
Each trial was trained using seizure data extracted from half the patients 
randomly selected from Group 1 and 2 and tested on the remaining data. 
Under no circumstances is a patient whose data are included in the training 
set used during testing. 

The average results are presented in Figure 5.9. These curves are generated 
by varying k in Figure 5.8. This parameter determines how different the 
current epoch must be from the 30 minute moving background to be detected 
as a potential seizure. 

Original reported performance is an average TPR of 90-93% and an FPR 
between 1.25-1.39 per hour. Although the TPR is only slightly lower in Figure 
5.9 — a maximum TPR of 85.4%- this occurs at a significantly higher FPR of 
4.4per hour. This could again be a result of the differences in the recording 
environment, seeing as their databases are recorded at much lower sampling 
rates and their evaluation of Monitor resulted in many lower FPRs than 
here. CNet is evaluated independently in [194] and although only one value 
is reported the performance as well as their database is more consistent with 
those here. 

The FPRs of the non-epileptic Group 3 data are slightly higher. 
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5.3.3 Algorithm 3: Reveal 
5.3.3.1 Algorithm Description 


Reveal is a commercially available seizure detector designed to target rhyth- 
mic activity. The detection algorithm is described in detail in [194] and sum- 
marized in Figure 5.10. Other relevant material includes [191] and [192]. 

The matching pursuit algorithm is applied to 2 second windows (1 second 
overlap) to extract features known as Gabor atoms, described in Section 3.4, 
used to describe the two most prominent rhythmic components in the epoch 
(see Section 3.4). The amplitude, duration and frequency of each atom is then 
stored for analysis. 

Unlike Monitor and CNet in which relative values are computed using fixed 
background windows, Reveal adaptively selects the best separation between 
background and current epochs by analyzing the similarities in the time series. 
A seizure is reported only when the difference between background and current 
epochs is large enough. To determine this difference two rule-based tests and 
4 different ANN tests are performed. Rather than lumping all rules in a single 
ANN, as with CNet, each ANN targets smaller rules and is trained separately. 
In this way all temporal and spatial context is lumped within the 2 rule-based 
and 4 ANN tests. 


5.3.3.2 Results 


The algorithm was tested using a trial evaluation of Reveal version 2007.12.07, 
downloadable from http://www.eeg-persyst.com/. This was desired over 
self implementation because of the complex and lengthy training periods, 
prone to error if repeated. The software is designed to support data recorded 
using the Compumedics equipment used here, and no transformations prior 
to testing are necessary. Since training is provided by the suppliers of the 
software all EEG data is used for evaluation. 

The average results are presented in Figure 5.11. These curves are gener- 
ated by varying seizure perception likelihood k shown in Figure 5.10. 

The reported performance in [194] is a maximum TPR of about 84% at a 
very low FPR of 0.6 per hour. The maximum TPR in Figure 5.11 is 77.5%with 
a corresponding FPR of 2.8per hour. This FPR can be reduced to roughly 
lper hour when visually unreliable channels are manually removed, as shown 
in green. However to be fair if this leniency is afforded here it should also be 
applied to other algorithms. For example, removing bad channels in Monitor 
reduces its FPR, as shown in Figure 5.7. CNet remains relatively unaffected 
and this curve is not shown. Since these unreliable channels were often the 
result of loose or disconnected channels — an artifact that is not difficult to 
identify — Reveal should consider automated removal. 

Again, the FPRs of the non-epileptic Group 3 data are slightly higher. 
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Data Available 
(2 second epoch, at--1--2--- " 


1 second overlap) : : PREPROCESSING 


FEATURE EXTRACTION 


[ ] EXPERT SYSTEM 


Preprocessing 


; : 
: H 
i ‘ 
| (Bipolar referencing, : 
1 Downsample to 32Hz) * 

LI 
: H 
L ' 


More Channels to 
Process? 


No seizure perceived in 
this channel. Label 0; 


Feature Extraction for this Channel (*): 
Seizure perceived in this 


1. Compute Matching Pursuit — extract 2 atoms : channel. Label 1. 

2. Compute background and foreground lengths Is seizure a Retain best seizure 
3. Classify each atom by rhythmicity and duration candidate? candidate score for this 
4. Retain most rhythmic atom channel. 


5. Compute candidate seizure score. 


Seizure is 
Compose seizure likelihood detected 
using all channels. Sort 
Seizure scores in descending 
order. Analyse top 3 channels 
to compute seizure perception 
likelihood (P). 


No seizure 
detected 


FIGURE 5.10: Reveal detection algorithm summary. Much of the complexity 
in this algorithm has been shifted to the training of 6 individual rules, 4 of 
which are ANNs, that target different aspects of seizure detection and include 
the spatial and temporal context. The seizure perception threshold k is a user 
defined variable ranging from 0 — 1. 
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FIGURE 5.11: Reveal ROC, generated by varying k in Figure 5.10. The 
maximum achievable TPR is 77.5%with a corresponding FPR of 2.8per 
hour, a poorer performance than the originally published results in [194] that 
give a maximum TPR of 84% at a very low FPR of 0.6 per hour. However 
by manually omitting unreliable channels the FPR can be reduced to about 
lper hour, as shown. 


5.3.4 Algorithm 4: Saab 
5.3.4.1 Algorithm Description 


The Saab algorithm, named here after the principal author, is designed by the 
same research group that also developed Monitor. The detection algorithm is 
described in detail in [156] and summarized in Figure 5.12. Readers may also 
be interested in an intra-cranial EEG implementation discussed in [57]. 

Similar features (namely RA and COV) are used, but are applied to the 
coefficients of wavelet decimation at different scales rather than the raw data. 
A third feature, RSE, computes the energy in each of the wavelet scales relative 
to the energy at all scales. The 5 scales used are representative of 50 — 100Hz, 
25 — 50Hz, 12 — 25Hz, 6 — 12Hz and 3 — 6Hz. Scales 2, 3 and 4 are used 
to characterize epileptic events because most seizures contain frequencies in 
these ranges. The remaining scales are used to identify artifact. 

'The innovation in Saab is the pure probabilistic approach to classification. 
For each channel the conditional probability that a seizure occurs given the 
features RA, COV and RSE (P(seizure|f eatures)) is computed using Bayes’ 
theorem 


P(features|seizure) P(seizure) 
P(features) 


P(seizure| features) = (5.4) 

The terms on the right are trained on observations from seizure data. RA, 
COV, RSE in each scale is collected from seizure and non-seizure training 
examples, the total range is divided into 5 distinct levels and probabilities of 
each are recorded. Detailed explanations of the training process are found in 
[156]. 
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Data Available 
(2 second epoch, 
no overlap) 


Preprocessing 
(Bipolar referencing, 


Downsample to 200Hz, 
0.5-70Hz filtering) 


Feature Extraction for Artifact Rejection: 


1. Compute amplitude of epoch (AA) 
2. Compute 60Hz amplitude (SA) 


3. Check if there is phase reversal artifact (PR=yes/no) 


YES: 
REJECT 


AA»1000gV, or 
SA»60yV, or 
PR=Yes 


NO 


Initial Decomposition: 


1. Compute wavelet coefficients for scales 1-5, 
representing frequency ranges 50-100Hz, 
25-50Hz, 12-25Hz, 6-12Hz and 3-6Hz 

2. Decompose each scale in current channel epoch 


into half waves (same manner as in Monitor) 
Feature Extraction for the half-waves in each scale: 
1. Compute relative average amplitude of Current 


epoch to 
background (30 seconds with 60 second gap) (AR) 


3. Compute square of coefficient of variation of epoch 
(COV) 
4. Compute relative scale energy (RSE) 


5. Compute EMG ratio of scale 1 to scale 2 (EMG) 


Determine Probabilities based on training examples: 


1. Sum probability of seizure given AR, COV, RSE 
and EMG for scales 2,3 and 4 (PSZ_CHAN). 
2. Use probability of alpha activity given AR, COV 


and RSE in scale 2. (PAL_CHAN) 


NO 
More Channels to 


Process? 


Compute Final Epoch Features: 


1. Sum the 6 largest PSZ_CHAN and scale by (1- 
EMG). 
Sum over 3 epochs. (PSZ) 


2. If 6 largest PSZ chans in same hemisphere, T=1, 
else T=0. 

3. Sum the 6 largest PAL_CHAN of occipital 
channels. 


Sum over 3 epochs. (PAL) 


YES NO 


NO 


NO 


YES 


No seizure 
detected 


Seizure is 


detected 


FIGURE 5.12: Saab detection algorithm summary. In the original published 
implementation kl = 4 and k2 = 2. The ROC curve in Figure 5.13 is calcu- 
lated by varying k1 (the value of k2 did not alter results significantly). 
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FIGURE 5.13: Saab ROC, generated by varying k1 in Figure 5.12. A max- 
imum TPR of 84.3%is achievable only when the FPR is as high as 12.6per 
hour. A more moderate FPR of 7reduces TPR to 60%. No patients had 
their FPR higher than 15 per hour, but several were close. Surprisingly the 
FPR in Group 3 is significantly lower than that in Group 1 and 2, a result 
not observed with any other algorithm. 


The expert system incorporates more sophisticated artifact rejection that 
ignores epochs with disconnected or loose electrodes as well as those that have 
abnormally large amplitudes. Alpha rhythm data are also rejected through 
the computation of the conditional probability P(alpha| features) trained in 
a similar manner as in Equation 5.4. Spatial contextual information is in- 
corporated by summing P(seizure|features) over 6 channels with highest 
probabilities and placing importance on the location of these channels. In 
this way evidence is amassed over several channels, providing a likelihood 
rather than a probability that a seizure is present in any one epoch. 


5.3.4.2 Results 


The algorithm was implemented using MATLAB version 7.1 and run on a 
standard 1400Hz Intel processor running Microsoft Windows XP operating 
system. Like CNet a training phase is required — the same 10 trial combinations 
used in Section 5.3.2 are used for averaging. However unlike CNet the training 
data included non-seizure as well as seizure data. 

The average results are presented in Figure 5.13, generated by varying the 
parameter k1 shown in Figure 5.12, responsible for thresholding overall seizure 
likelihood PSZ. Varying k2, the threshold for alpha activity likelihood PAL, 
was also attempted but results are less significantly affected by this variable. 
No real difference in results is observed when unreliable channels are manually 
removed, and thus this curve is not shown. 

The results reported in literature are as high as TPR of 76% with an FPR 
of 0.34. These results vary significantly from those observed here, where for 
an equivalent TPR at least 10 false positives per hour are reported. Whilst 
an improvement on Monitor is made, as reported in [156], this improvement 


202 Epileptic Seizures and the EEG 


does not compare with the superior performance of CNet or Reveal. Authors 
have been extended an invitation to submit their own code with their own 
training sequences, but no reply was received. 

The discrepancies may, like in Monitor, be explained by the EEG acqui- 
sition which may be more prone to noise in our database. Another possible 
reason is the difficulty in selecting the data used to train the algorithm. Like 
CNet this algorithm required 10 different trials with 10 different training se- 
quences. However in contrast to CNet in which performance over the 10 trials 
was relatively consistent, this was not so for Saab. Furthermore training relies 
on both epileptic and non-epileptic data and suffers from over-representation 
problems such as those of ANNs, avoided by CNet through the use of SOM. 
Saab depends heavily on the data used in the training process. CNet, on the 
other hand, is more robust to these inclusions. 

This is the only algorithm for which results vary significantly for non- 
epileptic database; in this case a lower FPR is achieved for Group 3. 


5.3.5 Comparisons and Conclusions 


Good performance for a patient un-specific seizure detector, where detection 
anytime during a seizure is sufficient, is a moderately high TPR with a mod- 
erately low FPR (see Table 5.3). What moderate means is left up to the 
individual. Here it is proposed that an FPR of 6 per hour is reasonable since 
this rejects at least 9096 of the record that must be reviewed. This is signifi- 
cantly better than, say, an FPR of 15 per hour that rejects only 7596 of the 
record. On the matter of sensitivity if experts can only agree with each other 
80% of the time (see Section 5.1) then anything above this value outperforms 
human classification. 

The relative performance of all algorithms can be compared from Figure 
5.7, Figure 5.9, Figure 5.11 and Figure 5.13. From these it is concluded that 
CNet provides the highest TPR, although if a slightly lower TPR is acceptable 
FPR can be reduced significantly by Reveal. Monitor and Saab produce less 
acceptable FPRs for this database. 

However the above observations do not provide a complete picture. These 
results are averages where each plotted value is calculated using the same 
threshold for all patients. Although the averages are indicative of general 
performance trends they may be un-representative when the inferior perfor- 
mance of any one patient may be improved dramatically simply by adjusting 
a user-defined threshold. This type of patient specificity can be applied to all 
algorithms in a fair manner, and is presented in Table 5.4 and Table 5.5. 

Table 5.4 reports the best achievable TPR when a maximum FPR of 6 
per hour is allowed. When an FPR of 6 is not possible the lowest FPR above 
this threshold and its respective TPR is reported. This gives an idea of how 
good performance is for a reasonable number of false detections. The results 
for CNet and Reveal remain relatively unchanged because all patients in both 
have average FPRs lower than this limit. However the results for Monitor and 
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Saab change significantly because tuning to a particular patient can improve 
results significantly. For example, prior to tuning Monitor was shown to 
have an average TPR of 6096with an FPR of 8. After tuning, Monitor can 
achieve a TPR of 68.5%with an FPR of 5.87. Some patients, namely 1, 2 
and 5 in Group 1, exhibit very high FPRs, close to and exceeding 15. These 
patients skew results and the algorithm performs adequately for the majority 
of patients. Similar observations hold for Saab. 

'Table 5.5 reports the best achievable TPR when a maximum FPR of 15 
per hour is allowed. This gives an idea of the best possible TPR of the al- 
gorithm, but also how FPR is affected by this performance. Again CNet and 
Reveal remain relatively unchanged because their average FPRs are well be- 
low the maximum allowable 15 per hour. However in both Monitor and Saab 
significant improvements can be made, at the expense of a much increased 
FPR. 

In addition Table 5.6 shows a patient specific summary of FPR in Group 3. 
These FPR correspond to an average TPR of 70% or above?. With a similar 
performance metric for all patients the FPR may be compared on a relatively 
even playing field. Again, it is evident that very high FPRs occur in Monitor 
to achieve equivalent results. This is not the case for Saab where Group 3 
yields much lower FPRs than Group 1 and 2. The Group 3 database must be 
expanded before definitive conclusions can be made. 

The specificities are provided in these tables for the reader that finds 
these numbers more intuitive. In any case some new information is provided. 
Whereas to calculate FPR 60 second blocks are used, specificity S does not 
do this. For example in Table 5.4 CNet achieves an average FPR of about 
double that of Reveal ( 4.03versus 1.84), whilst the specificity of CNet is half 
that of Reveal ( 0.84 %versus 2.26%). From this information we can deduce 
that CNet generates false positives that are short in duration and far apart, 
whereas those in Reveal are generally lumped together. 


In summary, Figures 5.7, 5.9, 5.11, 5.13, Table 5.4, Table 5.5 
and Table 5.6 demonstrate that 


e The overall best TPR is achieved by CNet. 


e The overall best FPR, with a sufficiently high TPR, is achieved by 
Reveal. 


e The performance of Monitor and Saab can be significantly improved on 
a patient specific basis, but overall performance remains inferior to CNet 
and Reveal. 


5A TPR of 7096 was selected because this is a reasonable performance that is achieved 
for most patients by most algorithms. 
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The advantages of using a common dataset for evaluation 
become clear — had the reported performances been believed un- 
conditionally these observations would not be evident. A fairer 
evaluation would include data recorded with different equip- 
ment, so that biases observed by algorithms such as Monitor 
and Saab that are sampled at much lower frequencies may be 
removed. 


5.4 Evaluation of Onset Seizure Detectors 


For many applications it is important to detect the beginning of the seizure. 
In this section we evaluate detection strategies based on how quickly a seizure 
is marked once it has begun. To narrow scope we concentrate on the class 
of problems where detection of a seizure must occur before clinical onset, 
applicable to situations such as the delivery of anti-epileptic drugs at critical 
moments. Only group 4 data are used because onset detectors of this nature 
are most often applied to intra-cranial data, where a much longer lead-up to 
seizures is frequently visible. 

'The onset delay metric described in Section 5.1.2 is defined here relative to 
the clinical onset of a seizure. The electrographic onset does not necessarily 
coincide with the clinical onset and may occur seconds or minutes earlier. A 
positive onset delay indicates a detection before the clinical onset, whilst a 
negative one indicates detection after clinical onset. We re-name this metric 
the warning time. 


5.4.1 Feature Extraction 


A few complete algorithms for the detection of intra-cranially measured seizures 
have been reported. See, for example, the work in [128], [127] and [59]. How- 
ever the focus of the study here is to the evaluate how well the different types 
of statistics presented in Chapter 3 can detect seizure onsets?. We select one 
time domain feature (cross correlation), one frequency domain feature (PSD), 
one time-frequency based method (wavelets) and one non-linear statistic (cor- 
relation dimension). 


SA study of this nature exists in [74] and [75]. However the methods of these studies 
are not available, and furthermore TPR and FPR metrics are not included. Here we aim 
to address this. 
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MONITOR CNET REVEAL 


Av. TPR 73% 76% 72% 71% 
Av. FPR 9.65 p/h 2.83 p/h 2.24 p/h 8.83 p/h 
Pat FPR S FPR S FPR S FPR S 
: (p/hour) (70) (p/hour) (70) 


Average Performance Group 3 
11.54 4.59 5.42 1.17 3.26 2.2 5.22 1.87 


TABLE 5.6: Patient specific results for the evaluation of general seizure de- 
tectors, described in Section 5.3. Here the FPR and specificity for group 3 
data is reported when the average TPR of each algorithm is higher than (but 
as close to as possible to) 70%. This value is chosen for comparison because 
it is a reasonable (although low) performance that is nevertheless achievable 
by all algorithms. Particularly in the case of Monitor some patients exhibit 
much higher FPR than others, capped at a maximum of 15 per hour. 


All methods use 10 second windows with an overlap of 9 seconds. A much 
longer window than that in Section 5.3 is used because it is expected that 
a much slower onset of seizures is observed intra-cranially. This makes the 
computed statistics more robust but can result in shorter warning times. À 
3-40Hz bandpass filter is applied, and data are normalized to unit variance 
(see Equation 3.13) prior to analysis. All features are extracted as described 
next, and then thresholded in a patient specific manner. 


5.4.1.1 Cross Correlation (XCORR) 


'The methods used to compute the cross correlation between two signals are 
found in Section 3.3.1. Specifically, the normalized cross correlation defined 
in Equation 3.19 is used. This statistic has a maximum value of 1 when two 
signals are identical. 

'The normalized cross correlation is computed for all 30 possible combina- 
tions of the 6 channels in each patient. Only the maximum cross correlation, 
observed across all channels and all time delays, is recorded for thresholding 
in each 10 second window. 


5.4.1.2 Power Spectral Density (PSD) 


The methods used to compute the power spectral density (PSD) of a signal 
can be found in Section 3.3.2. Specifically the PSD is computed as given by 
Equation 3.31. P — 10 averages are used over a 10 second window, resulting 
in a 1Hz resolution of the computed PSD. A Hanning window of 1 second long 
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(512 samples at sampling rate F, = 512Hz) is applied to each segment prior 
to computation of its PSD. 

To obtain a normalized statistic the ratio of low to high frequency energy 
content is computed. This is done by summing the PSD between frequency 
range 3-8Hz and dividing by the sum over the 9-30Hz range. It is expected that 
during a seizure there is a characteristic shift, typically to lower frequencies, 
in the relative energy observed [74]. Note that because we are working with 
ratios it is not necessary to normalize the computed PSD as per Equation 
3.30. 

In each 10 second window the ratio is computed for each of the 6 available 
channels. The largest ratio is retained for thresholding. 


5.4.1.3 Wavelet Analysis (WAV) 


'The methods used to compute wavelet coefficients of a signal at different scales 
are exactly those used in Figure 3.18(a) and described in Section 3.3.3. To 
get an analogous statistic to the PSD ratio of low to high frequencies, scales 
m = 5 — 6 (corresponding to frequencies 2-8Hz) are summed and divided by 
the sum of scales m — 3 — 4 (corresponding to 8-32Hz). Thus this statistic 
targets (qualitatively) the same phenomena as the PSD statistic, but in a 
different way. 

Again, in each 10 second window the ratio is computed for each of the 6 
available channels. The largest ratio is retained for thresholding. 


5.4.1.4 Correlation Dimension (CD) 


The methods use to compute the correlation dimension of a signal can be 
found in Section 3.3.4. First a time-delay reconstruction as in Equation 3.35 
is performed using 7 — 30 samples (0.06 seconds at 512Hz sampling) and 
dimension f; = 4. The correlation sum is then computed (Equation 3.38) and 
the gradient in Equation 3.39 is used as an estimate of correlation dimension. 
Elower = 0.1 and €upper = 2 are used for computation, consistent across all 
data because it has been normalized to unit variance prior to analysis. 

Unlike other statistics the correlation dimension is computed on all 6 chan- 
nels at once, i.e., all channels are lumped into the one calculation. The ra- 
tionale behind this is that the correlation dimension requires more data to be 
robustly computed; moreover the complexity of a system should be viewed 
as a whole and not on individual channels. This is done for each 10 second 
window. 


5.4.2 Results and Comparisons 


The features were extracted from the 3 datasets in group 4 EEG records and 
visually inspected for characteristics during a seizure. With the exception of 
the cross correlation all statistics are capable of detecting at least part of the 
majority of seizures. In fact these results are quite good compared to those 
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FIGURE 5.14: ROC curves for each of the 3 patients in Group 4 data, using 
the metrics defined in Section 5.4.1. Not all tests are appropriate for all 
patients, and only the ones in which characteristic changes are visible during 
the beginning of a seizure are shown. Overall, correlation dimension performs 
the best when compared to all other linear statistics because it has 100% 
TPR with zero specificity in all 3 patients. Cross correlation performs the 
worst because it is not applicable to 2 out of 3 patients, and has the highest 
specificity for 100% TPR for Patient 1. 
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FIGURE 5.15: Onset delay evaluation. Each graph shows the computed onset 
warning for one patient, averaged over all detected seizures. The variance 
of each computed mean is shown as horizontal lines on this mean, and the 
specificity for different thresholds shown in the vertical axis. The clinical 
seizure occurs at time zero, so that positive values indicate a detection is 
made before the clinical onset. Here it is seen that the cross correlation in 
column 1 is only applicable to Patient 1 and, coupled with results in Figure 
5.14(a), provides a warning time of about 40 seconds with 10096 TPR and 596 
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FIGURE 5.15: (Continued) In columns 2 and 3 the results of the PSD and 
wavelet statistics are shown. Both perform similarly, not surprising given that 
they are designed to target the same information. Again Patient 1 performs 
well with clear, consistent warning times. Patients 2 and 3 results are less fa- 
vorable, with no consistent positive warning times possible with either of these 
statistics. In column 4 the correlation dimension shows a much better perfor- 
mance for all patients; coupled with 100% TPR in Figure 5.14 it is capable 
of, in most cases, giving positive warning times. However, higher specificities 
must be tolerated for Patients 2 and 3, and this improved performance comes 
at the expense of a much greater computational cost. 
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reported in Section 5.3, highlighting the importance of both patient specificity 
as well as the differences between scalp and intra-cranial EEG. However since 
the focus of this study is on onset/warning time, greater importance was 
placed on identifying characteristic behavior at the beginning of the seizure, 
and in particular those that occur before the clinical onset. For example, the 
slower oscillations of seizures lead to a larger value of the PSD statistic during 
a seizure. However in patient 1 there is a period of high frequency activity in 
the lead up to generalization, and the onset detector is based on a decrease of 
the PSD statistic. The result is earlier detection in general, at the expense of 
some undetected events. The test derived for each of the statistics is patient 
specific, as is appropriate for the class of problems under investigation. 

First it is important to see, on a patient by patient manner, how well the 
features are able to detect seizures. The results are shown in Figure 5.14 when 
thresholds are varied. Only the statistics for which characteristic changes were 
identified are shown in each patient's ROC curves. Specificities rather than 
FPRs are reported because in this application it is more useful to know the 
percentage of time that a system such as automated drug delivery will be 
wrongly active. 

For Patient 1 all four statistics were capable of 10096 TPR with specificity 
less than 2%, and less than 1% if we exclude cross correlation. Results are 
not as good for Patient 2 where only the correlation dimension can achieve 
100% TPR and the cross correlation cannot be used at all. Similar difficul- 
ties exist for Patient 3. From discussions in Chapter 3 it is not surprising 
that cross correlation is a less suitable detector of intra-cranially measured 
seizures because electrodes are close to one another and greater synchronicity 
is observed at all times, not just during a seizure. The correlation dimension 
statistic is the best at detecting all seizures in all patients, whilst making no 
false detections. This comes at the expense of much greater computational 
complexity than for the linear statistics. 

Let us now turn our attention to the warning times these statistics provide, 
shown in Figure 5.15. Each graph shows the computed onset warning time 
for one patient, averaged over all detected seizures (i.e., undetected seizures 
are not included). Results must be coupled with those in Figure 5.14 to be 
complete. The variance of each computed mean is shown as horizontal lines 
on this mean, and the specificity for different thresholds shown in the vertical 
axis. The clinical seizure occurs at time zero, so that positive values indicate 
a detection is made before the clinical onset. 

Each column in this figure shows the performance of one statistic applied 
to all patients, and each row shows the results for all statistics applied to 
one patient. Looking first at cross correlation in column 1, only Patient 1 
benefits from its use. Average warning times of more than 20 seconds occur 
with 100% TPR and a specificity of 2% or greater. This is a fairly good 
performance overall. 

The nature of the PSD and wavelet statistics is very similar, and the results 
in columns 2 and 3 reflect this. Patient 1 exhibits 30-40 seconds warning, 10096 
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TPR and a specificity of about 0.5%, overall better performance than the cross 
correlation. Prospects are not so good for Patient 2 where 100% TPR is only 
possible using the wavelet statistic, and with high specificity. The variability 
in the computed onset times indicate that detections are rarely made before 
clinical seizure onset. The PSD statistic is not applicable to Patient 3, and 
whilst it is possible to use the wavelet statistic for this patient the onset 
warning times have specificities that are too high. 

'The story is much more positive for correlation dimension in the last col- 
umn, and the added computational complexity is justified in Patients 2 and 
3. Whilst not perfect, warning times for most detections occur before clinical 
onset (at 100% TPR) with 6% specificity in Patient 2 and 8-10% for Patient 
3. Average warning times are close to 50 seconds in both cases. 

When designing a patient specific onset detector it is not necessary to use 
the same statistic for all patients — it suffices to choose the best one. For 
example Patient 1 benefits from any of the four statistics discussed here, and 
weighing up the computational expense versus the warning time, it is probably 
best to use PSD or wavelets over correlation dimension. Recall that Patient 1 
has an average of 27 seconds between electrographic and clinical onset, thus 
the window to react is relatively long. This is not so for Patient 2 who has 
an average of 2 seconds to react. In this case it may be necessary to live with 
higher specificity as well as the computationally more cumbersome correlation 
dimension. Alternatively in such cases it may be necessary to shift to an even 
smaller scale measurement that may reveal longer lead-up times to clinical 
onsets. 

When results such as those here are considered it may be worthwhile asking 
the question: is there a need for seizure prediction? Given that so little 
progress has been made in this regard in the last decade (see Chapter 7) it 
may be necessary to move to a detection based regime. If research focus shifts 
to development of technology that can intervene quickly then Patient 1 and 
to a smaller extent Patients 2 and 3 can be helped immediately and with very 
simple signal processing. Of course the results are still patient specific and 
may not be applicable to all, but this is no different to the expectations placed 
on current seizure predictors. 


In summary, Figures 5.14 and 5.15 reveal that, for intra-cranial 
EEG recordings, it is sometimes possible to consistently detect 
a seizure well before its clinical onset. This occurs with more 
warning when a non-linear statistic such as correlation dimen- 
sion is used, but it comes at a computational expense that is 
not necessary in some cases because linear statistics perform 
comparably. 
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5.5 Conclusions 


A detector must be designed on an application specific basis. Similarly its 
evaluation is only possible when appropriate data and performance metrics 
are selected. In this chapter we evaluated several aspects of epileptic seizure 
detection using the EEG. 


In summary, this chapter shows that: 


e Evaluation of Classifiers: On average SVM outperforms ANN and is 
simpler to implement. 


e Evaluation of Patient-Unspecific Detectors: Reveal achieved the best 
FPR with sufficiently high TPR, whilst the best TPR is achieved by 
CNet. 


e Evaluation of Onset Detectors: Correlation dimension proved the most 
robust at the expense of a much greater computational burden. 


This list by no means implies all applications and all implementations of 
seizure detectors have been evaluated. It is only the beginning to how such an 
evaluation may be performed in a standard manner. 


Algorithms should be tested in a standard way, using standard 
data and standard performance metrics. Results reported in any 
other way cannot be compared. 


The data used in this chapter will be made publicly available with the 
publication of this book, so that this standard can be maintained. Visit this 
book’s website for updates. 
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Modeling for Epilepsy 


“The best model of a cat is a cat. Preferably the same cat.” 


- *Philosophy of Science", 1945, Arturo Rosenblueth (1900-19'70) 


A dynamic model is a set of mathematical equations capable of simulating 
the behavior of a system, at least in part. Its purpose is to explain the system 
behavior: What is it capable of? What is it not capable of? If the model 
is supported by physical observations, can these be explained by the model? 
'The set of equations must be complex enough to describe the dynamics of 
interest but preferably simple enough so that mathematical tools exist to 
analyze them. At a bare minimum the model must be computable. 

To construct a model of an epileptic brain one must look for cases where 
epilepsy exists — if the best model of a cat is a cat, why not simply look at 
a cat? Or, better yet, the same cat? The problem with this approach is 
that mice can also have epilepsy (so can humans for that matter). A model 
that can only explain the epilepsy of the one cat is unlikely to be useful in 
developing an understanding of epilepsy in general. If instead we start with 
'epilepsy is common to all mammals! we may model characteristics that are 
present in all forms. 

In this chapter models of the brain capable of describing epileptic seizures 
are presented. These are limited to models that are dynamic rather than 
static, to reflect the time-evolving nature of both the seizures as well as the 
measured EEG used to corroborate them. They are also limited to those that 
reflect the activity of networks rather than single neurons. Recall that in 
the order of thousands of neurons must collaborate for seizure-like activity to 
occur [73]. 

Limited resources, computation technology, or data, coupled with the pos- 
sible presence of chaos, makes explaining complex systems difficult. Even 
very well known systems such as the weather are too complex and cannot be 
modeled accurately because of computational limitations and lack of data. A 
human brain is of similar, if not greater, complexity than the weather: if the 
activity of each of the 100 billion (10'!) neurons in the brain is to be known 
then in the weather example this is analogous to making a measurement of 
temperature, pressure, humidity, etc., every 50-100 meters of the earth’s sur- 
face. This requirement is not feasible neither economically nor computation- 
ally. The same is true for the brain. We cannot measure the activity of every 
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single neuron, and even if we could it would still be very difficult to draw 
useful interpretations from such vast amounts of data. 
Simplifications allow us to manage our problem. Some examples include: 


e Model average activity: | Assume that changes in weather conditions 
over distances in the order of, say, 50km are insignificant. In this way 
the order of the global problem can be reduced by a factor of a million. 
A reduction of similar order can be achieved in the brain — the 50,000 or 
so neurons in a cortical column discussed in Chapter 1 are believed to 
behave roughly in synchrony, thus reducing the complexity of the prob- 
lem significantly. The assumed synchrony is an approximation because 
the time scales required to cause a column to act ‘as one’ are much 
smaller than those of interest in the study of epilepsy. 


e Reduce parameter space: In a complex system the fewer variables or 
parameters used the better. The benefits are exponential. Think of 
a model of N compartments put together, where the equations of each 
compartment have M parameters each with K possible values. Reducing 
the number of parameters M by one results in a multiplicative reduction 
in complexity of KM*N! It is important to have sufficient detail to 
be representative of the system of interest, but it is more important 
to reduce the number of variables to the absolute minimum. In the 
weather example, the model can be simplified by omitting information 
such as, say, elevation above sea level, if this does not significantly affect 
the temperature on a flat landscape. In the brain this means omitting 
information such as the differences in conductivity between gray and 
white matter, dendritic shape and size, or concentrating on the firing 
patterns rather than the physiology of the neurons that produce them 
if these details do not, on average, affect macro-scopic EEG. 


e Tackle a smaller problem: For the epileptic brain the sub-system most 
often studied is the cortical column. Its activity is formulated indepen- 
dent of the remainder of the brain, allowing for the understanding of the 
behavior that is not a result of global network interactions. 


The models presented in this chapter employ all these simplifying strate- 
gies. 


We seek a model of the entire brain, but this is too difficult 
a task at present. Models of neuronal networks presented in 
this chapter attempt to understand the uniform activity in sub- 
systems of the brain. They focus on how the behavior exhibited 
by these sub-systems is affected by parameters representative of 
the underlying biology. These parameters represent average ac- 
tivity in a meso-scopic network of neurons. One such sub-system 
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is the cortical column, although the theory is not restricted to 
this. 


Simplifying complex models is necessary but it comes at a 
cost. It relies on assumptions that do not always hold and hence 
conclusions must be checked carefully. 


A model can be deterministic or contain stochastic components. It is 
rare to find a completely deterministic system where a set of mathematical 
equations determines dynamics exactly. Most models are a combination of 
deterministic and stochastic elements and it is important to understand the 
different types of random contributions: 


e Model simplification: Using a simplified representation results in dis- 
crepancies between true and model behavior. The simplifications are 
necessary to make the problem tractable, but comparisons between 
model and real data reveal fluctuations that cannot be explained by 
the model. These may be accounted for by a stochastic component. 
If the simplifying assumptions are valid then these errors can be kept 
small. 


e Stochastic activity: The mechanics of certain aspects of a system are 
sometimes not understood, either because suitable experimental data 
do not exist or because its complexity cannot, at present, be captured 
by analytical methods. Even though the underlying activity may be de- 
terministic these components appear random and can be approximated 
as such. These random elements are part of the system itself and have 
particular properties (e.g., mean, variance, distribution) gathered from 
experimental data or assumed based on realistic constraints. An exam- 
ple is presented in Section 6.2 where a model based on the stochastic 
nature of action potential firing rates is presented. 


e Stochastic inputs: The system equations can be deterministic but 
driven by a stochastic process, that is, the input to the system is some- 
how random. One example is presented in Section 6.3 where the model 
equations are deterministic (even though they are derived from stochas- 
tic observations) but the input to this model is stochastic. Again, the 
real input to this system could in actuality be deterministic, but is so 
complex that it is approximated as a random process. 


Any combination of the above is possible. 

In the remainder of this chapter the emphasis is on the creation of models 
suitable to describe the EEG. Real EEG data are used to validate the model. 
Whereas in Chapter 2 the emphasis was on the passive mechanisms that affect 
the EEG, the focus here is on the active generators of electrical activity within 
the brain. 
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FIGURE 6.1: Neural models at different scales. In (a) a single neuron model 
is shown. The variables of choice are the input and output currents emergent 
from the cell body (soma). In (b) the activity in a network of neurons is 
modeled by the interactions between neurons of different types — in this case 
the excitatory and inhibitory neurons in the cortex. Vab denotes the strength of 
the response that neuron of type b has on neuron of type a. Larger scale models 
can be generated by joining neural populations together, as in (c), where the 
primary pathways between cortical and subcortical networks are shown. In 
(b) and (c) the parameters and equations must be derived from averages over 
neural populations. TRN refers to the Thalamic Reticular Nucleus, and SRN 
to the Sensory Relay Nucleus. 


In this chapter, Section 6.1 explains the parameters that are relevant in 
the meso-scopic and macro-scopic modeling of brain dynamics, expanding on 
the information presented in Section 1.1. 

In Sections 6.2-6.4 we present three different classes of models relating to 
three different scales in the brain: those derived from micro-scopic activity 
(Section 6.2); those that take averages over neural populations so as to describe 
the phenomena of the EEG (Section 6.3); and those that target large-scale 
dynamics (Section 6.4). In all cases it is networks of cortical neurons that are 
of interest because these are believed to play an integral role in generating 
and sustaining seizures. 

Finally in Section 6.5 we see how we can make use of these models. We 
try to understand aspects of epileptic seizures inferred from these models, and 
present practical ways in which these models are used in practice. 
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6.1 Physiological Parameters of Neural Models 


In this section we introduce the parameters and variables that are important in 
modeling the dynamics of the brain. The models themselves will be described 
in detail later. Let us go back to the definition of a system in Equation 3.2, 
and in particular the state transition formula, rewritten in continuous time as 


PO — Pial), t) (t) t) ; ato) (6.1) 

This equation describes the evolution of the system state z(t), given the 
current state z(to), a parameter set «(t) and inputs u(t) which both depend 
on time t. The behavior is determined by the map P. 

All dynamic models of the brain can be written in this general form, and 
can be made as detailed or as coarse as required by the application by chang- 
ing each of its components, possibly including infinite dimensional state z(t). 
Dynamic models have variables incorporated into the state z(t), and it is 
the dynamics (or changes) of these variables that we want to observe when 
we study them. They also have a set of parameters &(t) that parametrize the 
family of maps P. Parameters are quantities that are altered to cause changes 
in the dynamics of the variables, but are typically considered constant over 
the time scale of interest. That is, 


dx(t) 
dt 


is assumed. However in reality both parameters and variables are capable 
of change, thus the explicit dependence of x(t) on time t. But whereas the 
changes in parameters are caused through means external to the model, the 
changes in the variables must occur through the dynamics represented by the 
mathematical equations. In any case the distinction between parameter and 
variable is somewhat artificial from a mathematical point of view because an 
equation in the form of Equation 6.2 can always be incorporated into Equation 
6.1. 

There is no set of rules as to which quantities should be parameters and 
which should be variables. General guidelines include: 


=0 (6.2) 


e Time scale of change: Parameters are those that change slowly relative 
to the time scale of interest. In most mathematical analysis parameters 
are assumed static. Sometimes this is true, for example the length of an 
axon remains relatively unchanged throughout time. Others are capable 
of slow changes. For example the number of neurons in a network is 
relatively static in the time scale of 1 hour, but when looking at time 
scales as long as decades then cell death can be important in dynamics 
that depend on neural populations. Another example is the plasticity 
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of the brain, constantly learning and adapting to change, but at a much 
longer time scale than the EEG. 


e Availability of experimental evidence: In general, parameters are those 
that are measurable and for which experiments have been or will have 
to be done. Realistic constraints on these parameters exist. Variables 
are more explorative in that a general understanding of realistic values 
exist but it is more difficult to determine how small changes affect their 
behavior. 


An example of a parameter versus a variable is that of neurotransmitters 
and neuromodulators, explained in Chapter 1. The modeling of EEG dy- 
namics is concerned with time scales of milliseconds. At these time scales 
the effects that any one neurotransmitters has on the system (as opposed to 
the changes in neurotransmitter concentration) are assumed roughly uniform 
and for simplicity these effects are represented as static parameters. However 
over time the neuromodulators modulate the reaction to neurotransmitters, 
resulting in changes in EEG dynamics. When it is the dynamics of neurotrans- 
mitters that are of interest then equations where these are variables and the 
neuromodulators are parameters must be derived. A parameter can become 
a variable at different time scales. 


A variable in a model is a part of the state, or can be derived 
from the state, parameters, and inputs to the model. The pa- 
rameters K(t) are properties of the system, which when modeling 
real systems are in principle experimentally measurable. 


For all intents and purposes parameters are considered static, 
although in a system such as the brain some parameters are 
capable of changing over time. This change occurs at a much 
slower rate than the changes exhibited by the variables. If this is 
not the case then these parameters must themselves be included 
as variables in the model. 


In the remainder of this section we outline the parameters &(t) of most 
importance for single neuron and network models. Under the assumption 
that in the time scale of interest these parameters are constant, the explicit 
dependence on time f£ for parameters described here is omitted for better 
legibility. Again, effort is made to restrict discussions to those parameters 
believed relevant in the modeling of macro-scopic dynamics of the epileptic 
brain. 

'The reader can reference a list of the parameters described here, along 
with their notation, in Appendix 6.A. The equations in the form of Equation 
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6.1 that explain the behavior of the brain using the parameters defined here 
are outlined in later sections. 


6.1.1 Parameters in Single Neurons 


The parameters of interest at the single neuron level vary with the model. The 
well known Hodgkin-Huxley model of the action potential ([66]) describes the 
ion concentrations inside and outside the neural membrane. These concentra- 
tions are the variables that we want to observe, and the constant quantities 
that define how easily charge is transmitted through the membrane are the 
parameters. This model provides a very realistic description of currents gener- 
ated in the firing process. However the set of equations, although computable 
for a single neuron, become too unwieldy even with a moderate number of 
inter-connected neurons. Focusing on parameters at this scale (both spatial 
and temporal) is not useful for networks of neurons because they predomi- 
nantly are concerned with larger spatial and longer temporal scales. 

A simpler and more analytically tractable alternative is to model neurons 
as integrate-and-fire units, that is, model the currents on the dendritic tree 
as inputs that are integrated (summed) at the cell body to determine if an 
action potential is fired as an output. Unlike the Hodgkin-Huxley approach 
the morphology of the inputs and outputs is not important — both currents are 
modeled as spikes. Changes in parameter values can alter the observed firing 
pattern of the modeled neuron, and consequently can be used to represent the 
different types of neurons, if such detail is necessary. Thus the parameters that 
are important in this case are the properties of the neuron that generate action 
potentials, including the resting membrane potential Vp that determines the 
voltage of the membrane with no input; the passive membrane time constant 
Tp and membrane capacitance c that determine how fast changes in membrane 
potentials are possible; and voltages Vra and Vg that determine when an 
action potential is fired and the potential of the membrane after this occurs. 
The full parameter set listed in Appendix 6.A can be used to model the 
variables for membrane potential V(t) and the currents Ijeax(t), Ispike(t) and 
Isyn(t) shown in Figure 6.1(a)) that describe V (t). 


6.1.2 Parameters in Networks of Neurons 


A mathematical formulation of a collection of interacting neurons may or may 
not make use of the characteristics of a single neuron. Examples of both are 
presented in Sections 6.2-6.4. In either case the network parameters must be 
derived or hypothesized from evidence provided by ensembles of single neurons 
because a solution that is consistent at all spatial scales is preferable. The 
mean firing rate Qa(t) of neurons of type a is one such example, where the 
parameters in this function describe averages over neuron populations. 

One important aspect in a network of neurons is their expected histology, 
that is, the type and number of neurons in this network. In a slice of cortex 
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there are excitatory (e) and inhibitory (i) neurons and their numbers Na=e,i 
must be estimated from experimental data!. Each neuron receives input from 
a specific number of synapses Cas formed from neuron of type b onto neuron of 
type a. Whereas a can take the value only of the types of neurons within the 
network (a = e, i), b can take an additional value of b = n where n represents 
input from neurons outside the network. External inputs are usually modeled 
as stochastic elements, as is discussed in more detail later. An illustration of 
this is shown in Figure 6.1(b). 

'These parameters are measurable and can be used directly in the model 
or indirectly to deduce other parameters. For example if vg, denotes the 
magnitude of the response that neurons of type b induce on neurons of type 
a then vab is proportional to the number of synapses Cab and the current 
strength sy caused by a synapse with neuron type a. These latter are also 
experimentally measurable quantities. 

Recall that dividing a system into sub-systems is necessary to simplify the 
development of macro-scopic models. A patch of cortex, as opposed to the en- 
tire cortex, is one example of a sub-system. It can be modeled independently, 
as in Figure 6.1(b), or in conjunction with other sub-systems, as in Figure 
6.1(c) where the cortex interacts with thalamic networks. In such cases it is 
important to incorporate information about each sub-system (Na, Cay and 
sy) as well as the strength of interactions between sub-systems, also experi- 
mentally observable quantities. Additional information such as propagation 
time 19/2 of signals between sub-systems in Figure 6.1(c) is also necessary. 
'To distinguish between sub-systems different subscripts are used. Excitatory 
and inhibitory neurons in the sub-cortex are henceforth denoted s and r re- 
spectively to differentiate them from excitatory and inhibitory neurons in the 
cortex. 


In summary, a neural population with N neurons behaves 
according to the number N, of each neuron type a, the number 
of synapses Ca» resulting from other neurons of type b, and 
the strength of the synapse s,. All these parameters are used 
directly or indirectly to model neural networks. 


Connecting sub-systems of neural networks together requires 
other similar parameters that describe their interactions, includ- 
ing propagation delays. The manner in which the sub-systems 
are connected affects dynamics. 


1Of course, for an accurate model it may also be important to model the different types 
of excitatory and/or inhibitory neurons found in the brain, along with many other details 
omitted in this discussion. However for the purposes of this book, as well as for the very 
real necessity of a tractable mathematical model, many of these important details are not 
included here. In theory these models can be extended to include any number of different 
types of neurons. 
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In all cases the parameters must reflect the physiology of real 
neuronal networks, either by restricting them to realistic ranges 
determined from experimental data or by using knowledge of 
the systems to infer their behavior. 


Using the physiology in Section 1.1 and the parametrization discussed here 
we are now in a position to write mathematical equations of neural models at 
the different scales. We want to use these equations to describe the active 
generators of electrical activity, largely ignoring the passive mechanisms such 
as volume conduction explained in Chapter 2. Although these are impor- 
tant in describing the EEG we must wait until future developments allow the 
mathematics and the computing to describe large regions of the cortex. 

To avoid lengthy discussions of all available models and their variants, a 
subset that highlight the key issues are selected. These are believed to be the 
most general, whilst still relevant to describing epileptic seizures in both scale 
and parametrization. 


E: See ee 


6.2 Micro-Scopic (Statistical) Models 


We start the description of neuronal dynamics at the micro-scopic scale. A 
class of models that capture the statistical behavior of ensembles of neurons 
based on physiology at the micro-scopic scale is called an integrate-and-fire 
(IF) neuron model. They provide a biologically inspired representation of 
background network activity in a local and homogeneous region of the cortex. 
They are based on the simpler characteristics of neural firing (nowhere near 
the complexity of the Hodgkin-Huxley?model) but are nevertheless capable 
of replicating experimentally observed behavior of networks of interconnected 
cortical neurons in vivo. 


Integrate-and-fire (IF) models describe the statistics of the 
activity in a network of cortical neurons. The model may be 
stochastic if it includes a stochastic description of the input. The 
model may be deterministic if it only describes the distribution 


?' The Hodgkin-Huxley model describes the generation and propagation of an action po- 
tential based on trans-membrane cellular currents and concentrations. In this book we have 
focused on phenomena at a larger scale than this, and have never described the biochemical 
and gating mechanisms that make this possible. Thus a model of this nature is not included 
here, but the interested reader may consult [66]. 
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of the statistics. Typical distribution quantities used as part of 
the state z(t) are: 


e v(t): The mean spiking rate of action potentials in the network. 
e CV(t): The coefficient of variation of the inter-spike interval. 

e u(t): The mean membrane potential in the network. 

e o(t): The standard deviation of the membrane potential. 


e T(t): The effective membrane time constant. 


'The most important advantage of this class of models is that 
experimental data of the above variables exists, which can be 
used to validate the model. 


Following is a brief description of the mathematics along with important 
results. A comprehensive review on IF models can be found in [24] and [25]. 
Other related literature includes [23] and [108]; this is by no means a complete 
list. 


6.2.1 Model Summary 


Biologically inspired models first look at a single neuron. For those interested 
in a reference of the model only, a summary of all parameters, variables and 
equations can be found in Appendix 6.B, complemented by parameters in 
Appendix 6.A. 

A single neuron fires an action potential depending on the balance of in- 
coming and outgoing currents. The membrane potential V (t) is the difference 
in potential between the inside and the outside of the cell body (it is the proper 
definition of a voltage). Changes in V(t) occur according to three dominant 
currents shown in Figure 6.1(a): the passive currents in the membrane used to 
restore ion concentrations (Iieax(t)), the currents formed by the synaptic in- 
puts (Isyn(t)) and the output current of the action potential (Ispike(t)). Each 
of these currents groups together the behavior generated by many different 
classes of ion channels. 

The dynamics of V(t) are described by 


dV (t) 

dt 

The capacitance c scales the rate of change to the membrane potential. 
When there are no input and output currents the membrane voltage relaxes 
towards the resting membrane potential Vp, which is assumed constant. When 
currents are present, Jieak(t) works to restore the ion concentrations to this 


C 


= — leak (t) Lett) ust) (6.3) 
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resting state. Thus Ijeax(t) is proportional to the difference (V(t) — Vp). 
Mathematically?, 


c 
Hear (t) = = (V(t) — Vp), (6.4) 
p 
where the membrane time constant Tp is used to capture the speed at 
which the leak current restores V(t) to Vp. 
Ispike(t) is the current caused by an action potential spiking on the axon. 
It can be described by 


I spike(t) = c(Vrgn — Vg)ó(V (t) — Vry). (6.5) 


This says that the membrane potential reaches threshold (Vry), fires an 
action potential and resets the membrane to the reset potential (Vg). Here 
Vg < Vry. There is a refractory period 7, before the membrane potential 
begins to evolve again. 

Finally, Isyn(t) represents the effects of the synaptic inputs to the cell 
body. Isyn(t) can be described in terms of currents or conductances*. Both 
have been analyzed in literature (see, for example, [24]) but only the results for 
conductance based models are presented here for reasons discussed later. The 
equation for synaptic current in conductance-based synaptic input is given by 


Teyn(t) = J, co«(V(t) - Va)palt), (6.6) 


a=e,i,n 


where the subscripts e,i,n represent excitatory, inhibitory and external 
(excitatory) synaptic activity respectively, as discussed in Section 6.1. The 
inputs can often be recurrent because past outputs can affect present behavior. 
This is represented by a(t), a function drawn from a suitable distribution 
that integrates firing times of pre-synaptic neurons as well as transmission 
delays of post-synaptic potentials. The parameters o, are a unitless measure 
of conductance of synapses formed from neuron a (more details can be found 
in [108]). 

'The above is a description of how each neuron works, modeling pre- and 
post-synaptic currents, their integration and consequent firing. A neural net- 
work is constructed by putting many of these neurons together. For analy- 
sis, a network of N neurons, with Ne excitatory and N; inhibitory is used. 


3This equation is effectively Ohm's law. Because voltages are defined only up to a 
constant we could have re-written a simplified equation as Ijeqx(t) = V (t, where 
p 


V**(t) = (V(t) — Vp). However because the resting membrane potential Vp plays an im- 
portant role in some aspects of the neuron's dynamics it has become convention to explicitly 
include it. 

4Current based models try to emulate the currents passing through the dendritic tree, 
whereas conductance based ones emulate how conductivity is changed because of these 
synaptic currents (see Chapter 2). 
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FIGURE 6.2: Pseudo-matrix representation of aggregate models. Here the 
equations of one large model are formed by joining three smaller (identical) 
models together. The sub-systems are defined by zi(t), Z2(t) and z3(t) and 
their interactions are modeled via coupling parameters c1,2(t), ci,3(f) and 
c2,3 (1). The dimensions of aggregate models grow very quickly with the num- 
ber of sub-systems, and with the added complexity of coupling parameters this 
naive implementation quickly becomes unmanageable. To make the manipu- 
lation of larger models more tractable simplifications that remove unnecessary 
detail should be applied. 


Each neuron of type a = e,i has a number of synaptic inputs (Can exter- 
nal, Cae excitatory and Cai inhibitory). Connections are assumed sparse 
(1 << [Cue, Cai] << N) and chosen at random. 

To model network activity we need to make the jump from Figure 6.1(a) 
to Figure 6.1(b). There are several ways to go about this. The naive approach 
is that if we want to model N neurons we need to have N copies of the single 
neuron model repeated, along with their interactions. This is because the 
behavior of the single neuron is a subset of the behavior of an ensemble of 
neurons. Pictorially (in pseudo matrix notation) this scenario can be seen in 
Figure 6.2 for N — 3. Each neuron is modeled with its own set of equations, 
with the interactions as well as the inputs affecting each unit. 

However having N sets of Equations 6.3, 6.4, 6.5 and 6.6 is analytically 
difficult, perhaps even impossible, with a large enough N. The dimensions of 
the matrix in Figure 6.2 grow linearly with the number of neurons. 

As explained earlier when modeling complex systems it is necessary to 
make simplifying assumptions. For example, instead of modeling each neuron 
individually, we can model the average behavior of the ensemble, as done for 
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ensemble IF models. This reduces the complexity of the model and makes the 
problem tractable, under the assumption that it is a ‘good enough’ approxi- 
mation to the real activity. Further simplifying assumptions can be made: 


1. Because at any one time each neuron receives many small inputs from 
a network that is sparsely but heavily interconnected (that is, there are 
lots of connections but there could have been many many more), the 
excitatory and inhibitory inputs can be modeled as a stochastic pro- 
cess with a Poisson distribution®. The mean firing rate in the network 
is equal for both excitatory and inhibitory inputs under the generaliz- 
ing (and perhaps unrealistic) assumption that excitatory and inhibitory 
neurons have the same characteristics. The strength of the synaptic in- 
puts are given by Yae,ai(t) = v(t)Cae,ai, where v(t) is the average firing 
rate of neurons in the network. 


2. The external input is modeled as a temporally homogeneous Poisson 
process, that is, the input properties do not change over time and has 
constant intensity (mean firing rate vn, and Yan = v4 C4). 


3. Because connections are assumed sparse, correlations between firing 
times of different neurons are negligible. Neurons in such an ensem- 
ble can be modeled as firing independently?. 


4. Correlations between firing times of the same neuron can be ignored be- 
cause experimental observations show that spike times follow a Poisson 
distribution. This is only valid when the neuron is involved in back- 
ground activity. 


5. On average, firing times of a single neuron can be regarded as indepen- 
dent. 


Putting all this together and the fact that inputs are many and small 
implies that the system can be described by a probability density function 
of membrane potentials P(V(t)). This probability density function obeys the 
well known Fokker-Planck equation. The network dynamics can be studied 
by analyzing P(V(t)), and with the appropriate use of boundary conditions 
the stationary case (P*(V) = P(V (t > oc))) can be solved analytically 
0? P*(V) o 


„3P*(V) n aud i * 
DL 0o a tay lV MPV), (67) 


5The Poisson distribution is a stochastic process often used to describe arrival times, 
where the probability of an event (in this case an action potential) only depends on the 
time since the last event occurred. See, for example, [93] 

6This does not imply that neurons cannot fire at the same time - the dynamics of the 
system can bring the model to synchrony. The assumption of uncorrelated firing times 
implies that, when talking about average behavior, the existence of one spike cannot be 
directly attributed to another spike. Like all assumptions this one may not hold at all 
under all circumstances. 
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with M and © corresponding to the mean and variance of P*(V) respec- 
tively. This stationary distribution also has analytic solutions for steady state 
solutions of the variables of interest: v* = v(t — oo), CV* = CV(t > oo), 
p^ = p(t > oo), o* = a(t => oo) and r* = T(t > oco), all of which can be 
found in [108]. 


6.2.2 Validation and Limitations 


'The most important result is that conductance-based IF models are able to 
replicate (at least qualitatively) the behavior of more realistic models based on 
the Hodgkin-Huxley equations. Furthermore, the observed behavior produces 
quantities for v*, CV*, w*, o* and T* consistent with what has been observed 
experimentally in vivo (shown in Figure 6.3(a)), something that current-based 
IF models cannot do [108]. The advantage of this is that a relatively simple 
model for which analytic solutions exist can be used to study network dynam- 
ics. The Hodgkin-Huxley model does not scale to network models because it 
is too complex. 


IF models represent the activity in large-scale networks, but can 
still make a valid analytical connection to micro-scopic models. 


Aspects that have been ignored in the model include a synaptic time con- 
stant, non-sparse connectivity (both long and short range connections), corre- 
lations in the times of spikes, adaptation of neural firing, synaptic adaptation 
and failure, and dendritic morphology [24]. Keeping in mind that we are in- 
terested only in the modeling for the recognition of epileptic seizures (and 
not more complex tasks such as learning, for which adaptation is important), 
the behavior that we are interested in is captured by a relatively simple model. 
'This suggests that the elements that are important in network dynamics are 
already part of this model, and the simplifications are somewhat validated. 

Several activity levels have been identified through simulations. Figure 
6.3(b) shows some examples of these: a high-rate regular firing and a quiescent 
state both thought to be unrealistic or abnormal cases, and a low-rate irregular 
firing state believed to describe the background neural activity. This last is in 
agreement with both observed experimental data and Hodgkin-Huxley models. 

What is not clear from published material is how IF models relate to epilep- 
tic seizures. Analytic solutions have identified bifurcations and oscillatory 
regions in parameter space, but are these seizures? If so, what is the physio- 
logical interpretation that leads to epileptic seizures? Does the activity in the 
epileptic focus resemble the quiescent or the high rate activity shown in Figure 
6.3(b)? As far as the validity of the model goes, experimentally available data 
mostly belongs to low-rate, irregular firing, so it is difficult to infer what part 
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FIGURE 6.3: The advantage of conductance based IF models is that com- 
plexity is reduced and solutions are consistent with experimentally observed 
ranges, as shown in (a) where the IF model outputs are compared to real data. 
This is true for a wide range of external input mean spiking rate vn, shown 
as a function of 4, = 1.4Hz — the minimum input rate required to bring a 
neuron to threshold. r is the ratio of inhibition to recurrent excitation in the 
network [108]. In (b) are examples of some excitation levels identified in the 
model: high-rate regular firing (left), quiescent (middle) and low-rate irregular 
(right). Recall that firing of action potentials is achieved when the threshold 
voltage (Vrg = —55mV, as marked with a horizontal line) is crossed. The first 
two cases are believed unrealistic or abnormal behavior, whereas the low-rate 
irregular firing coincides with experimentally observed background activity. 
Graphics were reproduced with permission from its original publication in 
[108] using data provided by the authors. 
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of parameter space should/can represent epileptic seizures. Work specific to 
the disorder has to our knowledge not yet been published. 

Another unanswered question is: how does the IF network activity relate 
to the EEG? Few biologically feasible structural constraints have been put 
on network connectivity. What is the scale for which the model is valid? An 
arbitrarily sized patch of cortex, a cortical column, a minicolumn? The role 
of interactions between sub-systems within the brain have also been ignored, 
most importantly the thalamus, responsible for regulating flow of information 
and believed to be involved in the generation or spread of at least some types of 
seizures. How far can a simple model that has the advantage that analytical 
solutions exist be pushed to understand network dynamics? Some of these 
issues are addressed next. 


6.3 Meso-Scopic (Phenomenological) Models 


Meso-scopic models operate at a larger scale than the micro-scopic models be- 
cause they include interactions between sub-systems of the brain, as in Figure 
6.1(c). Again, the move to a larger scale necessitates a change in the simpli- 
fying assumptions because while models in the form of Figure 6.1(b) describe 
a subset of the behavior in (c) we also want to ensure that the computations 
do not become unmanageable. This concept is analogous to that from moving 
from a single/few neurons to networks, as is explained in Figure 6.2. 

As a consequence, the class of models described here are based on biology 
that is further removed from the dynamics of single neurons. They are phe- 
nomenological because the variables are related to the EEG — they describe 
macro-scopic phenomena. In the process some of the micro-scopic biology 
inherent in IF models is lost and many of the parameters important in the 
meso-scopic models are no longer directly measurable. Although these meso- 
scopic models are capable of qualitatively describing some EEG waveforms, it 
is difficult to infer what the smaller-scale mechanisms that lead to a particular 
waveform are. 

The mathematical framework for these models is not specifically con- 
strained to local activity. However insightful results only apply to local back- 
ground activity of a homogeneous network of neurons. In this sense they are 
not too different from IF models. The difference lies in the modeling approach, 
in particular the inclusion of inter-connections between neural networks. 


Phenomenological models describe the potential fields gen- 
erated by cortical and sub-cortical neurons. The equations de- 
scribing the system are, on the whole, deterministic. The model 
itself is stochastic only when a random external input is as- 
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sumed. The variables of interest relate to the fields generated 
by the different types of neurons involved and describe features 
of their distribution. For neurons of type a in a network these 
are 


e Q,(t): The mean spiking rate of action potentials. 
e V(t): The mean membrane potential. 


e ¢a(t): The field generated by the dendritic currents. 


In all cases the neurons involved are the excitatory and in- 


hibitory connections in the cortex (a = e,i respectively) as 
well as the excitatory and inhibitory neurons in the thalamus 
(a — s,r). 


The main advantage of these models is that they are read- 
ily related to the EEG waveforms under the assumption that 
these recordings are proportional to the dendritic fields of the 
excitatory neurons in the cortex (¢.(t)), as discussed in Chapter 
2. 


A large volume of work has been published about these models, pioneered 
by Walter Freeman [43] who in the early 1990s worked with the olfactory 
bulb to produce models at the level of a cortical column. Improvements and 
modifications have been made over time but the original concepts remain. 
The group of researchers involved in [148], [144], [146], [147], [154] and [22] 
advanced the model by involving interactions with sub-cortical networks, thus 
explaining many of the common EEG rhythms. Work in [95] is also relevant. 
A summary of the salient results of this model follows. 


6.3.1 Model Summary 


'The meso-scopic models describe average activity of a population of neurons 
and the mechanisms of a single neuron are no longer important. The model 
is explained here, but those interested in a quick reference only can find a 
summary in Appendix 6.C, complemented by parameters in Appendix 6.A. 

'The neurons of type a in an interconnected network such as that in Figure 
6.1(b) fire action potentials distributed according to a sigmoidal function of 
the membrane voltage. The firing rate is an average of the population and 
at any one time depends on the mean membrane potential V,(t), the mean 
taken over the ensemble of neurons considered. Although the slope of the 
sigmoid varies for neurons of different types the average firing rate Q,(t) can 
be described by a single sigmoidal function 
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FIGURE 6.4: A typical relationship between average firing rate Qa versus 
mean membrane potential V, for ensembles of neurons of type a. Here Q = 
250s-!, c = 3.3mV and V;, = 15mV. Qa is a non-linear function that at 
an operating point can be approximated by a linear function as shown. This 
linearization is an approximation that is only valid for values close to the 
operating point. 
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(6.8) 


where Vrg,o and Q are experimentally measurable parameters that de- 
scribe the mean firing threshold, standard deviation and maximum firing rate 
considered over the ensemble of neurons, respectively. A typical Q, is shown 
in Figure 6.4. Q,(t) (and similar forms of this equation) is widely used in 
these models as the mean firing rate of a population of neurons in response to 
the mean membrane potential, even though experimental justification remains 
vague’. 

Thus action potentials are fired at a rate indicated by Q(t) resulting in 
an electric field ¢,(t), proportional to this firing rate if the axons are short, 
as is the case with inhibitory neurons of type a = i. Excitatory cortical 
neurons (a — e) have longer projections and the field they generate is related 
to the firing rate via a low pass filter, denoted by the operator Dy, so that 
Dy, be(t) = Q(t). The operator is defined as 


1d 2d 
Dye = — -s + —— +1, 6.9 
WO y2 d nons" ee) 
where ye is a measure inversely proportional to the range of axons of 
cortical neurons, and thus also inversely proportional to the time it takes to 
travel along an axon, on average. 


TThe input-output functions of most cortical neurons are well known, and they are not 
really like the sigmoidal function shown in this figure. This function is used to represent 
the average of these input-output functions. 
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Action potentials feed into dendrites of neurons within the same popu- 
lation. In some models, a second sigmoidal function is used to describe the 
firing of post-synaptic potentials (PSPs) but this is ignored here for simplicity, 
thus the assumption that a(t) is proportional to Q,(t). The observation that 
the membrane potential requires co-incidence of many dendritic inputs leads 
to the postulation of a further filter relating the dendritic field to the mean 
membrane potential as 


DapVa(t) = 5 Vas (t), (6.10) 
b 


where Dag is a second order differential operator defined as 


1 d 1 q.d 
-—umtGtg 

aß dt? o Bodt 

The parameters 1/a and 1/ are the time constants of the filters, which 
physiologically relate to the rise and decay times of the cell-body potential 
produced by an impulse at a dendritic synapse. Together they define the 
shape of the filter. Vap is the strength of interaction between neurons of type 
b with a, proportional to the number of synapses from b to a. In summary 
Equation 6.10 describes the mean membrane potential of neuron of type a 
resulting from the the summed dendritic input from all other neurons, filtered 
by the dendritic tree in a manner described by Dag. 

Equations 6.8-6.10 describe the ‘mean’ behavior of an arbitrary neuron 
type within a local network. If the network is representative of a cortical 
area, for example a cortical column as in the IF models, then a = e, i and 
b = e, i,n, where n is the input from other cortical or sub-cortical populations. 
In this averaged form all neurons of a given type behave alike. The EEG is 
proportional to e(t) — the field produced by excitatory (pyramidal) neurons 
that align in the cortex (see Chapter 2). 

Whilst capable of reproducing much of the observed clinical phenomena, 
the cortical column alone is incapable of explaining the generation of certain 
common EEG rhythms (e.g., alpha) and oscillations (e.g., epileptic seizures). 
The addition of connections with another similar but sub-cortical thalamic 
sub-system introduces delays and feedback that are believed by some to be 
responsible for these phenomena. The modification is physiologically reason- 
able because the thalamus is responsible for relaying and regulating sensory 
information to and from the cortex. 

Cortico-thalamic and thalamo-cortical projections occur through the Sen- 
sory Relay Network (SRN), responsible for relaying external sensory informa- 
tion to the cortex, and the Thalamic Reticular Nucleus (TRN), an inhibitory 
interface to regulate flow of information between cortex and thalamus. Trans- 
mission delays (to/2) are introduced between the cortex and the thalamus. 
Thus to is the round-trip time for a signal passing through the thalamus and 
back to the cortex. All other delays are assumed negligible at the frequencies 
of the EEG. 


Dag E. (6.11) 
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'The behavior of both the thalamus and the cortex follows Equations 6.8- 
6.10, with the addition of neurons s and r, representing the neurons in the 
SRN and TRN respectively. The equations are also modified to include the 
feedback and relay that occurs between the two neuron populations. The 
complete setup along with all neuron types in each sub-system is as shown 
in Figure 6.1(c). The arrows indicate all the physiologically possible flows of 
information. 

Despite the many simplifications involved even this model remains rela- 
tively complex to analyze. A full behavior analysis (the collection of all possi- 
ble solutions) is prohibitively complex. The main ingredient that complicates 
the matter is the non-linear characteristics of Qa(t) in Equation 6.8. 

A further complication is the inherent delay in the model. Despite its 
apparent simplicity this model is still an infinite dimensional model, and its 
full bifurcation analysis remains an open question and is beyond the scope of 
the present text. The interested reader can find information on bifurcation 
analysis in texts such as [83] and [159]. 

Nevertheless some analysis is possible and is discussed next. 


6.3.2 Analysis: Linearization, Stability and Instability 


A typical analysis proceeds as follows. Assuming a constant input of a certain 
magnitude, one can compute a steady state solution (when all variables are 
constant). Note that there may be many steady states, not all biologically 
relevant. Mathematically we want to find a solution for the system z(t) 


P(z,&,u,t) = 0. (6.12) 


Dependence on time t has been removed because we want constant solu- 
tions, that is, we want z(t) = z, &(t) = k and u(t) = u. 

Next we can check which of the solutions will be observed in practice by 
analyzing the local stability properties of this solution. A common way to do 
this is through its linearization. 

Linearization is a way for the non-linear model from Section 6.3.1 to be 
made linear. Linear systems are desired because they are much better under- 
stood and more tools are available to analyze them. 

Linearization is achieved by approximating an arbitrary non-linear curve 
f(x) by a linear one. This is done at an operating point xg, as shown in 
Figure 6.4 for f(x) = Qa and z = Va, the only non-linear component in the 
meso-scopic model. The linear approximation f(x) is given by 


f(a) s, 2783 


(x — zo) + f (2o), (6.13) 


£=£0 


where the derivative is the gradient of f(x) at x = xo and f(xo) is a 
constant that is zero when equations are shifted so that the operating point 
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FIGURE 6.5: In (a) small changes in the input ¢, show that simulations 
of the non-linear and the linearized system are almost identical. In (b) the 
changes in $;, are large and simulations no longer agree because the linearized 
system is an approximation valid only for small changes around an operating 
point. $, is modeled as zero mean stochastic white noise with variance 0.001 
in (a) and 20 in (b). Such large changes are necessary because the system 
operating point is relatively unaffected by oy. 


is at the origin. This linear approximation is very similar to the original non- 
linear function (f(x) ~ f(x)) only around the chosen operating point and 
results are valid only around xo. Simulations of the linearized and non-linear 
system should be almost identical if the changes made to the system are small. 
This is shown to be true in Figure 6.5 when linearization is applied to Qa in 
the meso-scopic model. 

In the model the operating point represents a mode of behavior that is not 
expected to change too quickly because parameters remain relatively stable 
over time. If large changes occur the linearization is no longer valid. This is a 
major limitation in analysis of this type: time-evolutions can only be explored 
with relatively small changes around operating points. Thus often analysis of 
this kind involves understanding many different operating points, and even 
then only a small subset of all possible non-linear behavior is explored. Nev- 
ertheless it often suffices as the behavior around operating points is generally 
the most relevant. 
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FIGURE 6.6: The equations describing the model shown in Figure 6.1(c) 
can be re-written in terms of its steady-state linear behavior. The transfer 
function is a frequency domain representation of the interactions between 
cortical (C(s)), sub-cortical (S(s)) and thalamo-cortical (T'(s)) sub-systems. 
The feedback loop formed can be analyzed for steady-state stability. Equations 
for C(s), S(s) and T(s) can be found in Appendix 6.C. 


With a little bit of work Figure 6.6 shows the resulting linearized meso- 
scopic model, divided into sub-systems C(s), S(s) and T(s) representative 
of cortical, subcortical and thalamo-cortical interactions. These are spectral 
domain representations with s = 27fj for steady-state solutions and f is 
frequency in Hz. For the EEG the relevant range is 0 « f « 100Hz. C(s), 
S(s) and T(s) are known as the transfer functions of the system and they 
explain the relationship between the input and the output of the system. 
'They are 


Hu A(s) 1 
C s) = L(s)Ges 1— Li) 1- L(5)Gee DD 
S ) u L(s)Gse + L(s)GsrL(5)Gre —sto 

Hw TG... © 

u L(s)Gss —sto/2 

Aces DG DO i 
1 
AQ) 7 TFF 
L(s) = : 
(1 4- s/o)(1 4- s/B): 


(6.14) 


G are the gains of the linearized system that result from the linearization 
process. They are calculated as 


Ó * * 
Gab = Qo M Vab = a ( = a) Vab (6.15) 


~ OV, 
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where ¢; and V are the solutions of (t) and V,(t) at equilibrium. 
An overall system transfer function between input ¢,,(s) and output ¢-(s) 
is given by 


ey O E NN 
ón(s) 1 — C(s)S(s) 

Now that we have a simpler (linear) model we can use linear tools to 
analyze its stability. A solution is stable provided small changes in the present 
do not have a major effect on the solutions of the indefinite future. We want 
a small change in the starting condition to lead to responses that remain close 
forever. 

We can use the transfer function of a system directly to infer stability. 
'The denominator of this function has roots or solutions, also known as poles. 
If all real parts are negative the system is stable. The imaginary axis forms 
the instability boundary. Poles that are purely imaginary indicate oscillations 
or resonances. In this case the value of the pole denotes the frequency of the 
resonance, in Hz times 27. Poles that are very close to the imaginary axis also 
form resonances (although stable if negative) because small perturbations of 
the system can lead the system closer to the imaginary axis. The dominant 
frequency is determined by the pole that is closest to the instability boundary?. 
In our model the input $; is stochastic and thus a wide range of frequencies 
are excited. If there are any poles close to the imaginary axis the simulations 
will show these resonances. 

It is useful to visualize stability by plotting the real parts of all poles 
on the horizontal axis and imaginary parts on the vertical axis. A pictorial 
representation for a system with only 1 pair of complex poles is shown in 
Figure 6.7. 


(6.16) 


A stable linear system is incapable of describing oscillations. 
Transitions into oscillatory behavior in the non-linear system 
are transitions to instability in a linearized system.  Distin- 
guishing between stability and instability is important in the 
study of epilepsy. Epileptic seizures are identified with oscil- 
latory waveforms; thus in the linearized system transitions to 
epileptic seizures are represented by transitions into instability. 


The stability of a linear system is often studied through its 
transfer functions: A linear system is stable if the real part of 
the roots of the denominator, also known as poles, are negative. 
Graphical representation of the poles is an easy way to determine 
if the system is stable or not. 


*'The theory and proof behind the following information is an important result in elec- 
trical engineering, not presented here only for brevity. Information can be found in most 
introductory signals and systems books, such as [174], or [6] for simpler and more intuitive 
explanations. 
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FIGURE 6.7: Stability of a linear system can be inferred by looking at the 
poles (zeros of the denominator) of the relevant transfer functions. The real 
part of the poles is plotted on the horizontal axis and the imaginary on the 
vertical axis. In (a) we see a stable system because its two poles both have 
negative real parts. The simulated time domain signal (step response) oscil- 
lates but eventually settles to its steady-state value. In (b) is a marginally 
stable system because its two poles lie on the imaginary axis. The simulated 
time series is oscillatory. Finally in (c) we see an unstable system because 
its two poles have positive real parts. The simulated time series continues to 
grow indefinitely. The imaginary axis is called the stability boundary because 
it marks the transition from stable to unstable behavior. This stability anal- 
ysis can be applied to non-linear systems that have been linearized, although 
an unstable system in the linearized system may be stable in the non-linear 
one. Biologically, the unstable linear response will be limited in the non-linear 
system and often give rise to a periodic bounded response. 
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FIGURE 6.8: Stability boundaries of the meso-scopic model, shown in a re- 
duced co-ordinate system that separates the stability of Cs) in the x axis, 
S(s) in the y axis and T(s) in the z axis. Specifically the axes map to 
t= he y = Ti-S Aio and z — — Sprite with S. = GsrGrs, 
Sa = GesGse and S; = GesGĜGsrGrs. Each shaded region shows a different 
type of instability that can be used to describe the EEG oscillations of differ- 
ent seizures. Stable regions can explain the non-epileptic EEG. Also shown 
are the location in parameter space of different states of awareness: eyes open 
(EO), eyes closed (EC) and one type of sleep (S). This figure is reproduced 
from [22], page 3, with permission from Oxford University Press. 


Using this theory and a reduced co-ordinate system defined by 


UN Gee 
dios I= Gai 
= Sa + Si 
"= Xie 
Pe "LE (6.17) 


with S, = G,.G,,, Sq = GesGse and S; = GesGs,G,,. The authors of 
[146] have identified the stability boundaries of Equation 6.16 that are shown 
in Figure 6.8. Here a particular set of parameter values (&) correspond to 
a behavior represented by a location in this three-dimensional plot. If the 
behavior of the non-linear system is oscillatory then the linearized system is 
unstable and its poles have crossed the stability boundaries shown. If instead 
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FIGURE 6.9: The importance of the delay to between cortex and sub-cortex 
is demonstrated in the above example. The 10Hz alpha rhythm in (a) is 
only possible because poles can get close to the instability boundaries. When 
to = 0 in (b) the poles are driven further away from the imaginary axis — 
these instabilities are more difficult to generate under normal conditions. A 
non-zero to also introduces an infinite number of poles (not shown) allowing 
for greater variability in the behavior of the model. 


the behavior is stable in the linearized system then these boundaries have not 
been breached. 

The significance of this work is that instabilities of different frequencies 
have been identified. One parameter (or several) can be changed so that 
the behavior crosses one of these stability boundaries. This is representative 
of transitions between oscillatory and stable behavior, theoretically linked 
to transitions between seizure and non-seizure activity as well as common 
rhythms. The oscillations can take different frequencies and are able to de- 
scribe the various types of seizures that are observed in the real EEG. 

Next we describe how to interpret these stability boundaries in the context 
of EEG and epileptic seizures in particular. 
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6.3.8 Validation and Limitations: Rhythms in the EEG 


The behavior of the meso-scopic model can be analyzed by studying how 
parameters affect the spectrum of the simulated EEG defined by Equation 
6.16. While some parameters affect only the power of the signal (e.g., Gsn) 
others affect the range of the frequency spectrum (e.g., ye) or the position of 
the spectral peaks or resonances (e.g., a, 8). Whether these resonances occur 
at allis largely determined by the inclusion of the delay to/2 between cortex 
and sub-cortex. 

As an example Figure 6.9 shows the plot of the poles (stability analysis) 
of the model with identical parameters except to = 80ms in (a) and to = Oms 
in (b). The delay is not only responsible for introducing more poles into the 
system (in fact, an infinite number) but also pushes the entire plot closer to 
the instability boundary, thus making rhythms more likely. Without delays 
resonances occur very rarely. This is true generally. 

At this point in time results are such that models of similar nature to this 
meso-scopic model are the most suitable to describe epileptic seizures at the 
scale of the EEG, whilst maintaining relatively strong links to the biology 
beneath. Next we show that its dynamics are consistent with both normal 
and epileptic waveforms. The solutions presented are numerical simulations 
because, unlike the IF models, no analytic solution for the phenomenological 
model exists. 


6.3.3.1 Simulating the Normal EEG 


The meso-scopic model has been used to fit real single channel EEG data 
acquired from intra-cranial and scalp electrodes and was shown to produce 
waveforms and spectra similar to typical observed phenomena. Examples 
of these waveforms and their spectra can be found in Figure 6.10. (a) and 
(b) has been associated to the alpha rhythm with eyes closed and eyes open 
respectively, whereas (c) is similar to sleep-like waveforms. Their spectra, 
found in Figure 6.12(a), show their accordance with results shown in Figure 
3.15 — a strong 10Hz resonance for the alpha rhythm, also present but to 
a much lesser extent in the eyes open example where the alpha activity is 
dampened by visual stimulus. During sleep it is the lower frequencies that 
dominate. 

A plot of the poles is also provided. In all cases the system is stable because 
the instability boundary is not breached, but in the case of the alpha rhythm 
some poles are very close to the stability boundary. The magnitude of these 
poles indicates that the frequency of the dominant resonance is f = 37 ~ 10Hz 
— ie. consistent with the alpha rhythm. With the eyes open in (b) poles 
with similar imaginary magnitude also exist but these are no longer the ones 
closest to the instability boundary thus the 10Hz resonance does not dominate. 
During sleep in (c) the dominant poles are of much lower frequencies — the 
poles of larger frequencies are too distant to contribute significantly to the 
signal. 
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FIGURE 6.10: Example simulated waveforms that have been associated, by 
visual inspection, to typical EEG signals, as labeled. These are stable exam- 
ples because all poles have negative real parts — the waveform is driven by the 
stochastic input n, modeled as white noise. (a) is more oscillatory than (b) 
or (c) because its poles are very close to the imaginary axis. The dominant 
frequency of oscillation corresponds to the closest pole — roughly 10Hz. The 
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simulated PSDs of these waveforms can be found in Figure 6.12(a). 
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The model is validated by these relationships to real experimental data 
and also by the location of the waveforms in the stability diagram presented 
in Figure 6.8. For example, changes between eyes open and eyes closed is 
predominantly a shift in the y axis representative of the cortico-thalamic sta- 
bility. Closing the eyes removes the influence of external stimulus relayed 
by the thalamus. Sleep stages occur close to instability boundaries, which is 
supported by the fact that many people experience seizures in the transition 
in and out of sleep. This figure also postulates that the alpha rhythm is it- 
self an ‘instability’, and a region of parameter space that corresponds to an 
oscillation of roughly 10Hz has been identified. 


6.3.3.2 Simulating the Seizure EEG 


Recall that epileptic seizures can be identified in the linearized system from 
their instabilities. Linearization is an approximation to non-linear behavior, 
but if the linearized system is unstable it can no longer predict the behavior 
of the non-linear one (by and large). Thus an unstable linearized system can 
be purely oscillatory in the non-linear domain because ‘new’ solutions exist in 
the non-linear behavior. 

In Figure 6.11 we simulate three types of seizure-like activity possible with 
the meso-scopic model. In all cases the linearized system is unstable because 
at least one of the poles have positive real parts. In (a) and (b) the oscillations 
are caused by instabilities at 20Hz. In contrast the waveform in Figure 6.11(c) 
shows an oscillation of much lower frequency — roughly 3Hz. 

Notice that in (a) and (b) the poles are capable of estimating the frequency 
content observed in the simulated signal, as shown in Figure 6.12(b). This is 
because in both these there is only one pair of unstable poles. However in (c) 
three pairs of poles have had time to cross the imaginary axis. In this case 
many other bifurcations may have taken place, and the linear system is no 
longer adequate to describe what is observed in the simulated time series. For 
example, the 3Hz activity is not predicted by the magnitude of any of these 
poles. Analysis of multiple bifurcations is beyond the scope of this book. 

Each oscillation is the result of a different parameter « that corresponds to 
a different location (breaching the instability boundaries) in Figure 6.8. The 
different types of instabilities have been identified in [146] to be representative 
of the different types of epilepsies, as labeled in Figure 6.8. Whereas the 
spindle instabilities shown in Figure 6.11(a) and (b) are reminiscent of the 
focal (i.e., secondarily generalized) seizures, the 3Hz waveform in (c) is related 
to absence epilepsy. That the different instabilities shaded in Figure 6.8 cover 
distinct regions supports the theory that different mechanisms are responsible 
for focal and non-focal epilepsies. Variations in waveforms between epilepsies 
of the same type are explained by slight changes in the parameter space. 
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(c) 3Hz instability 


FIGURE 6.11: Example simulated waveforms corresponding to different types 
of instabilities. The input ¢, has hardly any effect on the output. (a) and (b) 
show spindle instabilities dominated by a 20Hz signal. These look like and 
thus have been associated with stages in seizures caused by focal epilepsies. (c) 
shows a 3Hz waveform often observed in absence epilepsies. The spectra show 
that in (a) and (b) the dominant frequencies correspond to the unstable poles 
that are closest to the imaginary axis. This is not so in (c) where there are 
multiple unstable poles, meaning that the system may have undergone further 
bifurcations. The linearized system in this case is incapable of predicting the 
behavior of the simulated signal. The simulated PSDs of these waveforms can 
be found in Figure 6.12(b). 
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FIGURE 6.12: Spectra of model simulated waveforms. In (a) are the normal 
states simulated in Figure 6.10, and in (b) are the epileptic states simulated in 
Figure 6.11. The spectra in both figures show that the dominant frequencies 
are those corresponding to the pole that is closest to the instability boundary. 
Also note that the spectra in (a) closely correspond to the real EEG examples 
provided in Chapter 3. 


6.3.3.3 Caution 


Even with increasing volume of support for this type of meso-scopic models 
it is important to remember that their validity remains largely speculative. 
For example this model assumes that the generation of the alpha rhythm is 
possible because of the cortico-thalamic loop. Model simulations and analysis 
support this theory. However other models can provide similar support for 
the theory that the alpha rhythm is due to the cortico-cortical interactions 
(see [95]). Without any experimental evidence to refute either paradigm each 
research group continues seeking and finding support for their own theory. 
The researchers of the meso-scopic model here have gone to great lengths to 
link these models to normal EEG, epileptic EEG, sleep disorders, etc. However 
the models are still local and they cannot explain everything. 


Even though close relationships between data and model are demonstrated 
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in this section it is only the phenomena that is explained by the model. The 
more complex effects (e.g., cognition) are beyond the scope of this level of 
models. Furthermore although these models are used to explain global phe- 
nomena their analysis is performed at a local scale — the inference to global 
dynamics remains speculative. 

In any case the phenomenological models are the closest and most devel- 
oped models capable of describing EEG dynamics in terms of meso- if not 
micro-scopic biological detail. 


6.3.4 Relationship to Micro-Scopic Models 


Let us now look at the connection between the micro-scopic (IF) models pre- 
sented in Section 6.2 and the meso-scopic models presented here. Even though 
the meso-scopic models incorporate sub-cortical interactions both models are 
not inconsistent with one another. Both begin by describing the behavior of 
ensembles of neurons such as those in Figure 6.1(b), and although not specif- 
ically in the same proportions, there is no reason why these ensembles cannot 
be the same. 

Assuming they are both descriptions of similar networks, then the dif- 
ference lies in the experimental data used to validate them. The IF models 
are easily linked to and supported by experimental evidence measured at the 
micro-scopic scale, but little is known about how they relate to macro-scopic 
measurements. The meso-scopic models, on the other hand, are easily linked 
to macro-scopic (EEG) measurements but the link to the micro-scopic scale 
is not clear. 

Ideally we would like a unified model that is easily related to both micro- 
scopic and macro-scopic activity. We don't need direct measurables, but if 
we were to understand the link between measurements at different scales then 
disorders such as epilepsy that transcend spatial and temporal scales could 
be better understood. For example, epileptic seizures are a macro-scopic phe- 
nomena because large regions of the brain must be involved to sustain such 
activity. However a seizure often starts at a meso-scopic scale (e.g., at an 
abnormal tissue focus) and the reason that it starts is likely explained by 
physiology and biochemistry of the micro-scopic tissue. 

In any case the mathematical links between such models remain largely 
unexplored, even though definitive links do exist. Granted some differences 
are irreconcilable because the fundamental strategy to derive the equations is 
not the same. Most notable is the differentiation between types of neurons in 
the meso-scopic model, where a different Qa(t), Va(t) and a(t) exists for each 
of a = e,i,r,s?. This differentiation is important in demonstrating the co- 
incidence between the model and the EEG: It is only the cortical excitatory 
field (ó,(t)) that affects this measurement, but this variable does not exist 


9Interestingly this differentiation is not uniform throughout, for example the parameters 
describing mean threshold potentials V;; and its variance ø are assumed the same for all 
neurons. 
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in the micro-scopic model. A considerable amount of work is necessary to 
re-write these equations in this form. 

Other differences are not so large. For example, many links can be made 
between the parameters and variables in both models, the most obvious being 
mean spiking rate (v(t), Q,(t)) and mean membrane potential (u(t), Va(t)). 
These are the same quantities in each model, the only difference being that 
in the meso-scopic case they are dependent on the neuron type a and in the 
micro-scopic they are common to the whole population. Would an average 
taken over a = e, i equate the two directly? (Should it?) Alternatively it may 
be relatively simple to incorporate different populations of neurons into the 
IF models. 


The links between the IF and meso-scopic model must be made 
explicit so that the meso-scopic model can be validated — or al- 
ternatively in-validated — physiologically as well as phenomeno- 
logically. Even though solutions will likely remain numerical 
rather than analytic they would be corroborated by biological 
evidence, an important step toward the creation of a single uni- 
fied model in which the inter-scale interactions are considered 
(see Figure 1.11). 


Alternatively a new model can be created in which each of the sub- 
populations of the meso-scopic models in Figure 6.1(c) is replaced by the 
IF equations and linked in a similar way. However it seems wasteful to disre- 
gard the large volume of work on the meso-scopic models. The smaller task 
is in identifying these relationships. 


6.4 Macro-Scopic Models (Future Outlook) 


No computationally tractable macro-scopic model of brain activity based on 
the active sources in the brain exists, even for a few centimeters of cortex. 
Recall that the behavior of the single neuron is a subset of the behavior at 
a (meso-scopic) network, and these are themselves a subset of the behavior 
at the macro-scopic level. Joining such sub-systems together quickly becomes 
(mathematically and computationally) problematic. 

However until mathematical analysis and computational simulations can 
catch up it is still worth discussing the framework, consistent at all spatial and 
temporal scales, that macro-scopic models may take. In Chapter 1 (see Figure 
1.11) it was postulated that larger scale models must account for the dynamics 
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FIGURE 6.13: A macro-scopic model can be derived by joining sub-systems 
such as those in Figure 6.1(b) and (c). Systems are coupled with strength 
C1,2,3,... a8 Shown. If all sub-systems are coupled to each other the complexity 
of the macro-scopic model becomes unmanageable. A simpler approach is 
to have a single coupling parameter 5 that describes the average strength of 
interaction between sub-systems. 


at smaller scales, while at the same time large-scale dynamics influence the 
behavior of the smaller scales. While this is true, models at both (or all) 
scales must exist for such a unified solution to be formulated. The focus is 
then on developing a framework consistent at all scales. The basic idea of an 
approach to macro-scopic modeling is presented here. The reader interested 
in more detail can refer to [122] and [120]. 

A meso-scopic model is derived by connecting micro-scopic sub-systems 
together. Similarly it follows that a macro-scopic model can be constructed by 
connecting these meso-scopic sub-systems together, as shown in Figure 6.13, so 
as to be consistent with the organization of the brain as a whole. Connections 
such as cortico-cortical projections, which by far outnumber thalamo-cortical 
ones but have been ignored in both IF and phenomenological models, can be 
included. 

However the problem quickly becomes intractable because not only must 
each sub-system be modeled by the described equations, but more parame- 
ters indicating the strength of interactions between them must be included. 
In addition the macro-scopic models cannot ignore the effects of volume con- 
duction discussed in Chapter 2. The spatial scales involved are larger and the 
conducting properties of the materials must be added to complement active 
sources. The complexity of the problem grows very rapidly. 

A gross simplification is to use a single extra parameter, 3, that represents 
how much of the dynamics are due to local versus global activity. 6 is re- 
ferred to as the background excitability control parameter that determines the 
amount of coupling between sub-systems. This is a simplistic approach that 
is nevertheless capable of explaining much of the global EEG dynamics. This 
idea is elaborated in some detail in [120] and [122]. They postulate that in 
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states of low cognition (e.g., anesthesia, sleep, resting alpha activity) 8 is (ar- 
bitrarily) higher and it is the global modes that dominate. Slower frequencies 
are more evident, particularly in scalp EEG which due to spatial integration 
and filtering are automatically biased toward cortical global activity. More 
complex/alert states of cognition lead to a reduction of low frequency com- 
ponents and an increase in high frequencies in the EEG, suggesting greater 
independence in activity between different sites in the cortex. Here it is the 
local models that dominate, and Ø decreases accordingly. 

In their discussions the parameter f is dependent on non-specific chemi- 
cals (e.g., neuromodulators), but can be related to the effects of slow acting 
neuromodulators that are responsible (in part) for determining the excitabil- 
ity of regions of cortex to local versus global contributions. The thalamus is 
also thought responsible in the regulation of the amount of coupling between 
sub-systems. 

Although the results are abstracted from much of the micro-scopic biology, 
the framework is very well developed and experimentally substantiated. Their 
work concentrates on global aspects such as the effects of volume conduction, 
spatial filtering and the shape and size of the head, but it does not deny the 
importance of local dynamics. This framework does not exclude the possibility 
of incorporating smaller scale models to represent the active mechanisms of the 
brain, their point being simply that the local models should not underestimate 
the importance of the global structure. 

'Thus all models presented in this chapter are consistent with one another, 
or rather they are not inconsistent. However the links between meso- and 
macro-scopic models remain more tenuous than those to IF models. At the 
smaller scales gross simplifications that ignore biology as well as the char- 
acteristics of the organization of the brain have been made for the sake of a 
tractable solution. In particular the cortico-cortical connectivity, which by far 
outnumber the sub-cortical projections in the human brain, have been ignored 
by both micro- and meso-scopic models. 

So many problems remain at the smaller scales that it is unlikely that links 
to more global models will be considered in the near future. However the 
importance of cortico-cortical connectivity is undeniable, and the framework 
presented here allows for their inclusion. It is abstract enough to be consistent 
at all scales, provided suitable progress is made in both mathematics and 
computational power. 


6.5 Practical Use of Models 


In this section we explore two ways in which the models presented in this 
chapter can be used to make practical progress in the field of epilepsy. In 
Section 6.5.1 we use the models to help us understand, investigate and infer 
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how seizures are generated and how they can subsequently be terminated. In 
Section 6.5.2 we see how we can directly apply these models to tell us about 
the EEG, even when the models themselves are very simple. 


6.5.1 Epileptic Seizure Generation 


Since relatively little is known about epileptic seizures (given the amount of 
research undertaken) it is not surprising that the dynamics of these models 
have been used to try to explain it. The focus of research at the moment lies 
on understanding the generation and subsequent termination of seizures. To 
a large extent the high-level causes remain unexplored, for example hyperven- 
tilation, sleep deprivation and stress, that increase the likelihood of seizures. 
Future research may reveal their impact upon the generation of seizures. 


Mathematical model-based epilepsy research studies how param- 
eters in the model can be altered to create seizure-like activity. 
The link to the underlying physiology depends upon how closely 
the parameters describe the underlying physiology. 


While these models are not specifically designed for studying epileptic 
seizures they can be used because normal brains are also capable of seizing. If 
a seizure is a state to which the activity of this brain is altered, the question is 
then (a) how does it start and (b) how does it spread, as discussed in Chapter 
1? How it stops is important as well. Because the meso-scopic models remain 
local, the question of spread cannot be addressed unless coupled with more 
cortical columns. The material presented here concentrates on initiation and 
in part on termination of epileptic seizures. 


6.5.1.1 Seizure Initiation 


Identifying the region of parameter space that corresponds to seizure-like ac- 
tivity can at most explain the state that the brain is in when seizing — by 
itself this information is insufficient to explain how the system is driven to 
this state. Transient analysis of the system, in the form of altering the sys- 
tem parameters, can be used instead. In [22] a single parameter is chosen 
and changed over time to explain generation and termination of absence and 
focal seizures. The mechanisms behind each of these types of epilepsies are 
explained in the context of system stability. One example of seizure transi- 
tions is presented in Figure 6.14, where the parameter vse that describes the 
strength of interaction between cortex and sub-cortex is varied as shown to 
drive a brain behaving normally into seizure. 

It is important to understand that the examples shown in Figure 6.11 
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FIGURE 6.14: Sample transition in and out of system stability to replicate 
transitions in and out of seizure. Use is varied as in (b), with additive stochas- 
tic white noise. In [22] such methods have been used to infer the importance 
of vse on the generation of these types of seizures but its significance remains 
uncertain seeing as many parameters can be used to generate similar wave- 
forms. 


and Figure 6.14 are instances rather than explanations. Virtually all parame- 
ters can be varied sufficiently so as to drive the system to instability /seizure. 
Blindly searching through parameter space does not lead to a better under- 
standing, especially since it is known that not all parameter changes are physi- 
ologically realistic. The research should be corroborated by experimental data 
or animal models. 

Finally, a distinction must be made between the model used here and a 
model of an epileptic focus. The meso-scopic model was designed for a normal 
region of the brain. Focal epilepsies often involve abnormal composition of 
neural tissue and its behavior cannot necessarily be explained by these models 
even if the range in the parameters is large. Thus the information provided 
here could simply be how normal tissue is driven to seizure, in which case 
it is an explanation of spread rather than initiation. Nevertheless this type 
of information should not be dismissed because many epilepsies, even focal 
ones, begin with changes in the anatomy that are not obvious. In order to 
understand the behavior of abnormal focal neurons it may be necessary to 
employ the IF models presented in Section 6.2. 
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6.5.1.2 Seizure Termination by Electrical Stimulation 


A seizure most often terminates naturally by unknown changes in parameter 
space. The mechanisms remain speculative even though links to model param- 
eters can be made at times. Intervention to abate a seizure is desired because 
continuing seizures can cause brain damage. While drugs are a drastic means 
of doing so when appropriate medical supervision is available!?an ideal solu- 
tion is full automation of the process. That is, we want an automated method 
that can abort a seizure on-demand once it has already begun. Recently a lot 
of research has been dedicated to implantable electrical stimulators used to 
stop a seizure once it has started. This is in contrast to stimulators such as 
VNS that deliver continuous stimulus so as to control seizures, described in 
Section 1.1.5 

Experimentally there is no clear evidence that this class of stimulators 
works. Here we briefly look at the possibility that electrical stimulators to 
terminate seizures are at least theoretically viable. Since the models are based 
on physiology, does the introduction of external current or voltage sources 
change the dynamics? Can stimulation be used to change the seizing brain to 
a stable state? 

Figure 6.15(a), (b) and (c) shows the same seizure-like waveforms as in 
Figure 6.11 but with an external stimulus applied at time t = 1.5 seconds. 
The stimulus is shown in (d) — a positive field that sums with local fields. In all 
cases the seizure-like activity is stopped, and the system is driven to stability. 
In the spindle instabilities representative of the focal epilepsies ((a) and (b)) 
the termination occurs when the stimulus is applied to either cortex or sub- 
cortex. For the 3Hz instability in (c) representative of non-focal epilepsies the 
termination is only successful when the stimulus is applied to the thalamus. 
'This is consistent with the view that the non-focal epilepsies and in particular 
absence seizures are thalamic in origin. Whether the state that the brain 
is driven to with stimulation is physiologically safe and whether the seizure 
returns once stimulus ceases is not addressed by these simulation. 

While a lot of work must be done to translate the indications contained 
in these figures into experimental or clinical trials, the analysis is useful be- 
cause it implies that at least theoretically electrical stimulation can be used 
to terminate seizures. Before experimental results follow, closer links to the 
biology and greater mathematical formalism are necessary so that the cor- 
rect stimulation parameters (e.g., duration, amplitude) can be inferred from 
the models. Future work may also be able to facilitate understanding of why 
electrical stimulation should work, if indeed it does. 


10Here we are talking about drugs used to abort seizures, that is, they are administered 
on demand. 'This is in contrast to drugs used continuously to control seizures, described in 
Chapter 1. 
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FIGURE 6.15: Example transitions out of seizure with the application of an 
external electrical stimulus, shown in (d). The spindle instabilities are stopped 
by cortical stimulation, whilst the 3Hz instability can only be stopped with 
thalamic stimulus. This is consistent with the belief that non-focal seizures are 
thalamic in origin. The simulations show that at least theoretically electrical 
stimulation may be used to stop seizures once they have started. In the figures 
above signals after stimulus have been amplified so that the state of ‘normal’ 
versus ‘seizing’ activity is more obvious. 
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6.5.2 Limitations of the EEG 


So far in this chapter we have described models that attempt to mimic the 
physiology of the brain to as much detail as possible. At the same time many 
simplifications were necessary so that the mathematical problem is tractable. 
Whilst making the problem easier to compute, removing detail inevitably led 
to models that were difficult to link to either physiology or to macro-scopic 
measurements such as the EEG. 

It is important to understand the limitations of what simplified models 
can tell us about a system, but having limitations does not make a model 
unusable. In this section we highlight this by discussing the limitations of 
the EEG using a simple linear model of the brain for which the model details 
itself are not known. 

Consider the EEG as the output of a dynamical system that represents 
the brain. Suppose for a moment that the actual brain dynamics can be 
represented by a discrete time linear system 


z[n +1] = Fz[n] 4- n[n]. (6.18) 


This is an instance of 3.2, where n[n] is a disturbance or noise signal, 
presumably representative of external input into the cortex. [n] may be 
modeled as a white noise, but it suffices to assume that it is independent of 
the state z[n]. Also, assume that the EEG signal y[n] is simply a linear map 
from the state, 


y[n] = Cz[n]. (6.19) 


Let the dimension of z[n] be N and the dimension of y[n] be M, with a 
typical M much smaller than N. 

It is reasonable to postulate that the state is a very high dimensional object 
(several billion variables at least). Over a limited period of time, the period 
over which the noise characteristics may be considered stationary, it is plau- 
sible to think of F as representing a collection of interconnected oscillators. 
This means that all the eigenvalues of the matrix F have modulus 1. 

In a generic context it is possible to reconstruct the state z[n] from ob- 
servations of the EEG signal, the system output y[n], over a sufficiently long 
window of observation (say using the celebrated Kalman filter). In principle 
this means that at least N/M time samples are required, that is, in order 
to find z[n] it is at least necessary to observe y[n], y[n + 1, y[n + 2] ... till 
yn 4- N/M]. With N being of the order of hundred billion and M of the order 
of hundred, this means that a few billion time samples are required! Unfortu- 
nately, such a long observation window (longer than 500 hours, using 512Hz 
sampling) is not at all compatible with the horizon over which the dynamics 
can be considered stationary. Neither are such observation windows practical. 

The problem is actually worse. With F being a matrix with eigenvalues 
of modulus 1, the uncertainty in z[n] grows linearly over time (because the 
variance of z[n] is linear in time, driven by the variance of the noise input 7[n]). 
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Even if it were possible to reconstruct z[n] after having observed y[n], y[n + 
1], till y[n + N/M], this knowledge is virtually meaningless as the difference 
between z[n] and z[n+N/M] is equally of the order of N/M. The Kalman filter 
is essentially meaningless in this context, as the variance of the error between 
the estimate of the state and the state variable itself will grow indefinitely. 

The same idea can be gleaned from an information theoretic point of view. 
The amount of information available at each measurement point is captured 
by the number of bits obtained, M x B = MB, where B is the resolution of 
the recorded EEG in bits per sample. Over a horizon time T' the total number 
of bits gathered is T x M x B = TMB. On the other hand, z[n] contains N 
states, and in order to know z[n] to within b bits precision requires N x b = Nb 
bits. The brain map, being a collection of interconnected oscillators, is known 
to be measure preserving. Furthermore the uncertainty in z[n] grows due to 
the noise, linearly in time, say at a rate g. So, in order to learn the state z[n] 
it is necessary that TMB exceeds Nb + gT, at some time T in the future. 
Again it follows that the observation window has to be large enough, and in 
order for this to be true then 


T > Nb/(MB — 9). (6.20) 


Furthermore for this to be meaningful, it has been assumed that at any 
instant in time the amount of information gathered M B is actually larger 
than the amount of information destroyed by the noise, i.e., M B > g. This is 
a generous assumption that is impossible to be certain of, but it is used here 
because otherwise the reconstruction of z[n] quite simply cannot be done. The 
conclusion is again that it will take in the order of a few thousand hours of 
EEG observations in order to learn the state of a "stationary" brain. 

'The above argument certainly undermines any attempt of using embedding 
ideas presented in Section 3.3.4 in order to recover brain dynamics. Indeed 
apart from the fact that in practice it is not possible to know the map F, 
which was tacitly assumed knowledge in the above argument, the observation 
window required to achieve this “in principle” possible state reconstruction is 
not realistic at all. 

Does this imply that it is not possible to detect or predict an epileptic 
event from an EEG record? No, because knowledge of the state is not called 
for to make this decision. It is of course feasible to consider the condition of 
an epileptic seizure as being described by some binary valued decision map 
from the state: E(z) — 1 indicating the presence of an epileptic seizure and 
E(z) = 0 indicating no epileptic seizure active. That such a map F exists is a 
consequence of the definition of state itself. Nevertheless, it is also clear that 
deciding on the presence of an epileptic seizure can be regarded as a decision 
from the history of EEG observations as well, after all that is the way clinicians 
make this decision. The issue is one of how to learn this decision map from a 
collection of examples, which in this book has been approached using machine 
learning ideas, through the construction of an expert system. 
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6.6 Conclusions 


The formulation of mathematical models of neural dynamics depends largely 
on the scale of interest, both spatial and temporal. This in turn depends on the 
application. The models presented here are geared toward the understanding 
of epileptic seizures, although they are in no way restricted to this application 
alone. 


In the understanding of neural dynamics of epileptic seizures 
it is important that a model be able to explain normal as well 
as seizure behavior because an epileptic brain has seizures only 
part of the time. Without this, transitions in and out of seizures 
cannot be studied. The differences between the epileptic brain 
and the normal brain must be encoded in the parameters of the 
system. 


Having said this, one of the major limitations that current models have 
is that their analysis is largely numerical and limited to stationary behav- 
ior. Little work has been done toward understanding the time evolution of 
neural dynamics; in particular the transition in and out of seizures remains 
inconclusive. 


At this stage neural models at the meso-scopic scale are still be- 
ing validated, that is, they are in the process of being compared 
to real data to see if they have been formulated correctly. At 
the same time these models are being used to gain insight into 
the nature and source of epileptic seizures. 


Even with this limited understanding it is still possible to utilize these 
models in practical applications, for example to infer the limitations of the 
EEG as is done in Section 6.5.2. This casts further doubt on the use of mea- 
sures that utilize embedding theory for the prediction of epileptic seizures. 
'The next chapter discusses, in more general terms, the question of how pre- 
dictable seizures really are. 
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6.A Physiological Parameters and Notation 


Notation 


Subscript e 
Subscript i 
Subscript r 
Subscript s 
Subscript n 


Refers to excitatory neurons in the cortex. 

Refers to inhibitory neurons in the cortex. 

Refers to neurons in the TRN. 

Refers to neurons in SRN. 

Refers to input external to the network, most often 
excitatory. 


Single Neuron Model Equation Symbols 


Qa 


Membrane capacitance. 

Membrane time constant. 

Absolute refractory period. 

Resting potential. 

Firing threshold. 

Reset potential (after firing). 

Constant reversal potentials of excitatory /inhibitory 
neurons. 

Conductance of synapse from neuron type a. 


Network Architecture 


Yab 


Number of neurons in network. 

Number of neurons of type a = e,i in the network. 
Number of synapses from neuron type b to each 
neuron of type a. 

Firing rates of neurons of type a. 

Strength of synaptic inputs from neurons of type b 
onto neurons a, with Yab = C'ab. 


Average Network Behavior 


The maximum mean firing rate (of action potentials) 

of a neuron population. 

The mean threshold for a neuron to fire an action 
potential. 

The variance of Vry, assumed to have Gaussian 
distribution. 

A measure inversely proportional to the range of axons 
of cortical neurons. Ye = v where ve is the propagation 
speed of the action potential. 
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6.B Summary of IF Model 


Variables and Steady State Solutions (subscript ‘0’) 


v(t), vo Mean spiking rate. 

CV(t), CVo Coefficient of variation of inter-spike interval. 
u(t), Ho Mean membrane potential. 

a(t), ao Standard deviation of membrane potential. 
T(t), To Effective membrane time constant. 


Model Specific Functions 


V(t) Membrane potential, defined between the inside and 
outside of a neuron's membrane. 

Ticak (€) Current due to passive leak of membrane. 

I spike(t) Current describing the spiking mechanism of the neuron. 

Leyn t) Current describing the effect of synaptic input to the 
neuron. 

e(t) Recurrent input function to each neuron. 

P(V,t) Probability distribution of membrane potential V (t). 

P*(V) Stationary solution to P(V(t)). 


Changes in the membrane potential V (t) of a neuron cell body are dictated 
by incoming current Isyn(t) and outgoing currents Ijeax(t) and Fixe (t) 
dV (t) 
dt 


Spiking currents occur because the membrane voltage reaches threshold 


[e = — Ineax (t) — Ipike(t) + Isyn(t). (6.21) 


VTH 


Passive leak currents exist to return the membrane potential to the resting 
state Vp 
G 
Irak (t) = —(V(t) = Vp). (6.23) 


Tp 


The synaptic inputs here are conductance-based currents modeled as 
Isyn(t) = 5 caq(V (t) — Va) a(t), (6.24) 


where Qa is the trans-membrane conductance of neurons of type a, and 
Palt) is a function that defines the recurrent input to neurons of type a. This 
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recurrent input is determined by the number of pre-synaptic spikes, summed 
temporally over a neural population, accounting also for a transmission delay. 

Connecting N sets of such equations to form networks of neurons quickly 
becomes an intractable problem. Instead simplifying assumptions can be made 
over a group of neurons, as listed in Section 6.2, leading to a model that 
describes the probability distribution of V(t). This distribution P(V(t)) fol- 
lows the Fokker-Planck relationship, which for the stationary case P*(V) — 
P(V (t > oo)) is given by 


* 2 D* 
OP (V) -0-0 P*(V) o 
Ot QV? OV 
with M and O corresponding to the mean and variance of P*(V) respec- 
tively. The advantage of this model is that analytical solutions exist for v*, 
CV*. u*,o* and 7*. These along with derivations can be found in [108]. Their 
simulated values comply with in vivo observations of a group of neurons. 


(V — M)P*(V)], (6.25) 


6.C Summary of Phenomenological Model 


Model Specific Functions and Operators 


Q.(t) Mean spiking rate of a group of neurons of type a = e,r, s. 
Also known as pulse density. This is related to the cell body 
potential Va. 

V,(t) Mean membrane potential after inputs from the dendrites have 
been summed and filtered. This is relative to a resting potential. 

Palt) Potential fields induced by a group of neurons of type a = e, i, 
T, S. 

Dye Characterizes the low pass filtering effects that cortical 
damping has on the potential in the cortex. 

Dag  Characterizes the low pass filtering effects that dendrites 
have on incoming signals. 


Parameters & (Steady State) 


Vab Parameters that characterize the effects that neurons of type 
b have on neurons of type a. 
Gab Gain of each stage in the linearized system, proportional to Vab. 
Bia Parameters that characterize low pass filtering of dendrites. 
In biology these parameters represent the inverse rise and fall 
times of the potential produced by an impulse at a synapse. 
19/2 Propagation delay between cortical and subcortical systems. 
On The external input to the brain, assumed constant. 
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The mean spiking rate (see Figure 6.4) is described by equation 


Q 


~ 14 ec QO-Vrnl/o 


Qa(t) 


à —e,8,T, (6.26) 


where Q is the maximum firing rate of a neural population and ø is the 
variance of Vry, all quantities defined in Appendix 6.A. 

The potential fields a(t) generated in the dendrites are proportional to 
the firing rate, as explained in Chapter 2. Excitatory neurons in the cortex 
experience low pass filtering (Dye) because their axons have non-negligible 
lengths. 


DyeQe(t) = Q.(t) 
Qilt) = Q.(t) 
ós(t) = Qs(t) 
ét) = Q(t) 
1d 2d 1d ld 
LEE o n e i 1) E i). 


In general fields ¢,(¢) are assumed proportional to firing rates, with the 
exception of ¢-(t) that experiences damping by the dendritic tree because ex- 
citatory neurons have longer projections. Notice that $;(t) is proportional to 
Q.(t), another simplification that makes the model more tractable by remov- 
ing a variable. 

The mean membrane voltage V,(t) of each neuron type is determined by 
the strengths of interactions with the fields $4 (t). Dendrites filter the incoming 
dendritic input (Dag). 


to 


DagVe(t) = VeePe(t) + VeiPi(t) + Veshs (t — 3j 
DapVr(t) = rede (E — 2) + toss 
DasValt) = vsebelt — 2) + vardr (t) + Vondn(t) 
1d? 1 id 1d ld 
Pos = apa tat gati Ca Ca 


The non-linear components Qa(t) in the model are approximated by linear 
functions around operating point — defined as an equilibrium when Va(t), 
a = e,r,s, are stationary. The resulting linearized system is described by the 
transfer functions in Figure 6.6. Formally we replace 4 by s, giving a simple 
algebra to quickly compute relationships between variables. 
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S = 
(1+ s/a)(1 + 8/8) 
s = j2nf for sinusoidal steady state, or s o < more generally. 


The gains Gap can be calculated from the steady-state behavior at the 


equilibria V; and $7. 
Vab = %, (: — à) Vab. 
v=V; c Q 


Note that there may be many equilibrium solutions. 
The overall transfer function H(s) between input ¢,(s) and output els) 
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On the Predictability of Seizures 


“The farther backward you can look, the farther forward you are 
likely to see." 


- Sir Winston Churchill, (1874-1965) 


Prediction is a difficult task. To be able to forecast the future an in-depth 
knowledge of both past and present is required. The more complex a system 
is, the more information that is necessary to predict the same distance into 
the future. Although Winston Churchill’s words were probably spoken in a 
more philosophical sense they hold true also for numerical analysis — how far 
back in time must we look in order to be able to tell anything about the future? 

Let us first define what we mean by prediction and how it differs from 
detection. In a real system measurements are only available of the past. A 
detector uses these measurements to make a decision about this past and the 
present. A predictor, on the other hand, uses them to make a statement about 
the future. All predictors are forms of detectors, the difference being in the 
time at which the measurement is taken relative to the time for which the 
detection is made. For example, a seizure detector tells us that a seizure has 
begun, whilst a predictor tells us if a seizure will begin. 

'The weather example is once again useful in pointing out the difficulties 
of forecasting the future. Its prediction has become a daily convenience to 
many of us: we turn our TVs on and expect that there is some truth in what 
is said the weather will be like tomorrow, the day after, or even next week. 
But how are these predictions made? The most comprehensive and versatile 
solution is to create a physical model of the earth's atmosphere that accounts 
for all factors that affect the weather at any point on this earth. Humidity, 
topography, air pressure and even how many people decide to drive to work 
all need to be considered. This is an incredibly complex system for which even 
our most modern modeling tools may not be sophisticated enough. 


“When the number of factors coming into play in a phenomenolog- 
ical complex is too large, scientific method in most cases fails us. 
One need only think of the weather, in which case prediction even 
for a few days ahead is impossible. Nevertheless no one doubts that 
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we are confronted with a causal connection whose causal compo- 
nents are in the main known to us. Occurrences in this domain 
are beyond the reach of exact prediction because of the variety of 
factors in operation, not because of any lack of order in nature". 


- Science, Philosophy and Religion, Albert Einstein, (1879-1955) 


Thus although a physical model exists it may be too complex to realize, even 
with appropriate simplifications. In such cases an alternate approach is to base 
predictions on the assumption of a stochastic model. These models forecast 
future events by restricting the range of possibilities, unlike a purely deter- 
ministic model where an exact outcome is predicted!. Statistics are gathered 
over time and future events predicted based only on past observations. We 
limit our prediction to allow for variability, so we may predict that it is likely 
to rain, rather than it will rain. The more information that is collected, so 
long as this information is still valid, the more accurate the forecasted range 
is likely to be because we are predicting a statistic drawn from a probability 
distribution. 


The statistics of a purely stochastic model do not themselves 
explain anything about the physical processes involved (even 
though physical processes may be inferred through the study of 
these statistics). These models are known as black box or data 
driven methods. 


A model does not have to be purely physical or purely black box. A gray 
box model is when some aspects of a system are known and modeled by a phys- 
ical process, and others are unknown and modeled by black box components. 
Weather prediction is an example of a gray box stochastic model because 
physical models with built-in allowance for the variability in measurements 
are used. They are effective because monitoring of weather patterns has been 
in place for many decades, and the statistics can be used by experts to in- 
terpret the outputs of the physical models. Although a full physical model 
is less limited in what it can tell us than black or gray box models, for such 
a complex system a gray box stochastic model is today the only practical 
solution. 

'The same observations hold true for seizure prediction — epilepsy is un- 
doubtedly a behavior of one of the most complex systems that humans are 
trying to understand. Current “predictors” of seizures rely on the assump- 
tion of stochasticity simply because our models of the brain, presented in the 


lIn Chapter 6 physical models with stochastic components are described. This is un- 
like purely stochastic models where none of the physical signaling are considered in the 
mathematics. 
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FIGURE 7.1: The predictability of a map is affected by the measurement 
precision. In (a) and (b) is the map described by Equation 7.1 where z[n] 
is measured using 2 values (one bit precision) and 4 values (2 bits precision) 
respectively. In (a) one bit cannot be used to calculate z[n 4- 1] because both 
ranges of z[n] span the entire range of z[n + 1]. In (b) two bits in z[n] divides 
z[n + 1] into two ranges, thus z[n + 1] can be predicted to an accuracy of 1 
bit. In (c) it is shown how even this simple map is very unpredictable under 
the presence of small errors. A difference of 0.001 in z[0] of each trajectory 
results in very different behavior after only 10 iterations. 


previous chapter, are not developed enough to be usable for this task. Black 
box statistical analysis applied on collected data such as the EEG are used to 
attempt a prediction. 

The predictability of a system depends on how the collected data are mea- 
sured and thus is a function of the technology used to make it. This applies 
to both physical and stochastic models. Think for example of a simple single 
dimensional system whose purely deterministic map is described by 


z[n + 1] = 42[n](1 — z[n]), 0 < z[0] < 1. (7.1) 
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This says that if a measurement of the current sample z[n] exists then the 
value of the next sample z[n + 1] can be predicted. This map is restricted to 
the range 0 « z[n] < 1 shown in Figure 7.1. Now consider a measurement of 
z[n] described by only 1 bit, that is, the only information available is whether 
z[n] « 0.5 or z[n] 2 0.5 as in Figure 7.1(a). Then this measurement cannot be 
used to predict z[n +1] because the measurement error results in a prediction 
range that spans the entire range of z[n -- 1]. However if 2 bits are used 
(the range of z[n] is divided into 4, as in Figure 7.1(b)) then z[n + 1] can be 
predicted to an accuracy of 1 bit. In this system z[n + k] can be predicted to 
1 bit accuracy if z[n] is known to k bits accuracy. 


Successful prediction requires the extraction of information de- 
rived from the measurement of the past and the knowledge of 
the dynamics. 


In Chapter 2 the many limitations of EEG measurements are described in 
detail. Whilst the recording equipment itself is reasonably precise the volume 
conducting properties of the head make EEG a macro-scopic representation 
of millions of neurons at a time. Even if these volume conducting properties 
did not average out the activity of large regions of the brain, recall that in 
the Preface to this book we estimated that the EEG reveals 1 bit of infor- 
mation per second for every cortical column (10? neurons). Predictors that 
rely purely on the EEG are wanting to forecast into the future with very lim- 
ited information, which given the complexity of the brain seems improbable. 
In any case whether EEG data are sufficiently accurate or localized to make 
successful predictions remains dubious. 

Finally prediction is difficult because a system may not be very predictable 
in the first place. Let's examine Equation 7.1 again. If z[n] is known exactly 
and to infinite precision then the future of z can be predicted forever. Seeing 
as z[n] cannot be known exactly any error multiplies roughly by a factor of 2 
every iteration of the map. With an error of 0.0196 z[n] can be predicted only 
13 steps ahead to 1 bit accuracy, only 9 steps with an error of 0.196 and only 
6 steps when the error must be contained to less than 1%. This is where the 
essence of chaos theory, colloquially referred to as the butterfly effect, stems 
from. Small errors can have large consequences, as shown in Figure 7.1(c). A 
system may have inherent limitations as to how predictable it is because of 
(1) measurement error, (2) stochastic fluctuations in the system or the model 
of the system or (3) the model used to make predictions does not approximate 
system behavior to sufficient accuracy. 

Irrespective of whether physical, gray or black box models are used, and 
whether the EEG as a measurement is appropriate, there is a more fundamen- 
tal question that has rarely been asked: “are seizures predictable?", and if so 
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“how predictable are they?” They have been assumed predictable because up 
to 5096 of people suffering from epilepsy are able to predict their own seizures, 
usually with warning symptoms such as headaches or mood alterations that 
appear well before the clinical onset [63, 142]. In some cases people are able 
to learn how to use these warning symptoms (also known as auras) to prevent 
their incidence [161]. However, short term auras are accepted as the point at 
which a seizure has already begun without yet impairing consciousness. Long 
term auras have been linked to the build-up of epileptic activity. Auras are 
therefore detectors rather than predictors of epilepsy?. A systematic study 
that determines if seizures are indeed predictable does not to the best of our 
knowledge exist. 

Given that auras often occur well before clinical onset the question “is 
there even a need for seizure prediction?" is also valid. 'The purpose of seizure 
prediction algorithms is to provide sufficient warning to allow for some form 
of intervention. This is predominantly important for patients that cannot be 
surgically or pharmacologically treated — by knowing when a seizure is immi- 
nent they can retain some level of control over their own lives. It is also useful 
for the more timely delivery of fast acting drugs or electrical stimulation, 
in this way reducing the side effects associated with traditional pre-emptive 
treatments. But if the development of seizures is slow enough so that the elec- 
trographic onset (sometimes manifest as auras) is present some time before 
impairing consciousness then the same can be achieved through the detection 
rather than prediction of these symptoms. It was shown in Section 5.4 that 
this is already (sometimes) feasible for intra-cranial records with detection 
algorithms available today. Some research groups (e.g., [13]) decided that 
this direction is the most logical, but although preliminary results are promis- 
ing whether the seizure can be aborted once it has started has not yet been 
adequately answered [111] (refer to Section 6.5.1). 

Detection based treatment has not made prediction of seizures redundant. 
Preventing seizures from occurring could be more beneficial than stopping 
them once they have begun. For example, there are cases in which epilep- 
tic people feel the build-up of an oncoming seizure over hours, days or even 
months in which they crave for a seizure to occur so that they may feel better. 
Although the existence of this build-up is contentious, it supports the hypoth- 
esis of seizures as reset mechanisms — the solution to a problem rather than 
the cause? [72]. If prediction were possible then this reset could be performed 
externally, thereby abating the need for a seizure — a preventative rather than 
reactive measure. 


? Another argument for the existence of a pre-seizure state is the ability of some dogs 
to "predict" seizures. However these dogs also predict pseudo-seizures and are once again 
examples of detectors rather than predictors, most likely through motor-based symptoms 
[35, 85]. 

3 Although it is also possible that this desire for the seizure is a psychological need by 
the patient to associate some positivity to the situation. 
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On a more practical note prediction rather than detection provides a longer 
warning time in which action can be taken. This is important seeing as breach- 
ing the blood-brain barrier in the delivery of fast acting drugs as well as the 
use of electrical stimulation are both novel methods for which much research is 
still necessary. If longer horizons are provided then traditional drugs that take 
minutes rather than seconds could be used today with a successful predictor. 

Predictors today are far from clinically applicable. Most prediction al- 
gorithms are based on many assumptions: they assume the existence of a 
dynamic and deterministic model — even though this model is not known; 
they assume that the measured EEG is appropriate in the representation of 
this model - this may not be true (see Chapter 2); they assume that relatively 
short amounts of data can be used to infer statistical invariants such as Lya- 
punov exponents and synchronicity — this has been disproved in Chapter 3. 
Although it is not impossible that dynamical characteristics can be tracked 
in this way, it seems unlikely that these methods are reliable considering the 
volatility observed in EEG records. 

'The performance of these statistics as predictors creates further doubt as 
to what they are really detecting in the first place. They work only in very 
specific cases and have for the most part been invalidated in recent publica- 
tions, when tested on larger and independent datasets [11, 61, 62, 88, 101]. In 
some cases the methods were shown to perform no better than random (see 
[111] for details). Furthermore nothing is known about the nuances of the 
data being tested: What do the EEG records leading up to the seizure look 
like? What do the inter-seizure events look like? Are they stereotyped? Is 
it possible that these methods are effectively a form of signature recognition, 
and are therefore yet another example of a detector rather than predictor? 
Without knowing exact details of the data under test (beyond the typical in- 
formation such as sampling rate, number of channels, etc) then these questions 
are unanswerable. 

Methods based on signature recognition exploit only phenomenological 
information. They do not attempt to understand anything about the gen- 
eration of the seizures. They cannot, themselves, be used to infer anything 
generally about the dynamics of epilepsy, and seizures that are not typical are 
likely missed. It has been argued, and it could be true, that these statistics 
are indeed measuring a slow change in the EEG that is representative of the 
dynamics. Patients for whom this does not work can be explained by a recruit- 
ment that is too fast to notice — it all makes sense under this paradigm. Just 
as valid, however, is the argument that what is being detected is a stereotyped 
pattern leading to a seizure. We must accept this possibility. Until continu- 
ous, standardized data sets that include a blind validation phase are available 
prediction (as opposed to detection) of epilepsy cannot be verified. 

Fortunately measures have been put in place making a more unified effort 
possible. The International Workshop of Seizure Prediction is a regular meet- 
ing scheduled every two years promoting collaboration and providing common 
datasets for testing. The third such meeting was held in Freiburg, Germany, 
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in 2007. The standardization of datasets as well as the introduction of valida- 
tion tests that compare algorithms against random predictors [160, 195, 196] 
provide a more rigorous platform if not a solution to the problem of seizure 
prediction. 

Successful or otherwise; predictors or detectors; dynamical indicators or 
stochastic fluctuations, current prediction has not addressed the fundamental 
question: how predictable are seizures? The work in this chapter is an attempt 
to answer this, in part, with particular interest placed in the identification of 
memory in the epileptic brain. If no memory exists and seizures are caused by 
a sudden, abrupt transition then they are unpredictable and therapy should 
concentrate on their detection. If, on the other hand, memory exists then 
arguably seizures can be predicted. 

How long this memory is determines what avenues of treatment are pos- 
sible as well as the type of data that can be used to make these predictions. 
In this chapter data sequences consist of only the times of epileptic events. 
'These times are analyzed to determine if memory exists in the generation of 
epileptic seizures. These datasets span from 3 hours to 25 years, and at first 
glance look like random events that cannot be predicted. The aim here is 
to determine if some structure can be identified in this seemingly stochastic 
process, in which case information about future seizure times can be obtained 
from past seizure events. 

'The methods used to identify the existence of memory involve the detection 
of scale-invariance in these datasets. This is a phenomenon that indicates that 
behavior at different time scales is similar to each other. Scale-invariance does 
not itself indicate memory unless it is of a specific type. Because the words 
scale-invariance, power-laws, long-range memory and many other terms that 
are all relevant to this study are frequently used, confused and abused in 
literature, the next section is dedicated to defining and discussing what this 
terminology means. A description of the datasets is then given in Section 7.3, 
and the analysis and discussion of results are presented in Section 7.4 and 
Section 7.5 respectively. 


SSS 


7.1 Predictability — Terminology Made Clear 


In the context of predictability, the notions of self-similarity, scale-invariance, 
self-organization, criticality, power-laws and long-range dependency are often 
used liberally and, to confuse things, interchangeably, without much explana- 
tion. This section is here to clarify some of the concepts. 

We start with the most general concept, self-similarity. An arbitrary en- 
tity (e.g., a picture, an object, a time-series) is said to be self-similar if the 
properties of the entity are the same when it is looked at as a whole or in 
parts. A common example is that of fractals, the most famous of which is 


270 Epileptic Seizures and the EEG 


s D 


(c) 


(a) 


FIGURE 7.2: Mandelbrot set. (a) is the complete Mandelbrot set, (b) 
zooms into part of (a), and (c) zooms into part of (b). Selfsimilarity 
is evident because the same types of structures are present regardless 
of the zoom scale. The diagrams were generated using the software 
Fractal Explorer, available for download on Matlab Central File Exchange 
(http: //www.mathworks.com/matlabcentral/fileexchange/). 


the Mandelbrot set shown in Figure 7.2. (a) shows the Mandelbrot set in its 
entirety. Enlarging a part of the set, as in (b), results in an image that shows 
patterns that are very similar to the whole. This continues indefinitely at 
finer scales. The properties at different scales do not look ezactly the same as 
each other, but similar types of structures exist. Self-similarity exists in the 
real world — coastlines, snowflakes and even the flows and eddies in Leonardo 
da Vinci's drawings are considered self-similar (within certain spatial scales) 
[18]. 

Scale-invariance is an instance of self-similarity. The Mandelbrot set in 
Figure 7.2 is spatially scale invariant because similar properties are present at 
all spatial scales of the picture. More relevant to this chapter is temporal scale- 
invariance, which for a stochastic process implies that the statistical properties 
at different time scales (e.g., days versus hours versus minutes) effectively 
remain the same. Scale-invariance has been studied for earthquake frequency, 
internet traffic and economic data [130], and has also been identified in many 
biological systems including the timing of ion-channel opening in neurons, 
auditory nerve fiber action potentials and human heartbeats [7]. 

Self-organization is the ability of a system (of interest here — the brain) 
to organize itself into a state of increased complexity without the need for 
external interference. The system can change thanks to properties of the 
network and its dynamics and not because of external input. Self-organized 
criticality (SOC) is then the ability of this system to evolve from a state that 
is not self-similar or scale-invariant state to one that is. These concepts are 
explained very well in [15]. The identification of SOC in brain function is 
an active area of research because it is believed that understanding why it 
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exists will reveal important mechanisms in brain function. For example in 
[17] a study of spontaneous activity in in vitro brain slices revealed that SOC 
exists in the ‘avalanches’ of activity that occur, a trend that may be important 
for optimal information transfer and stability in cortical networks. In [153] 
evidence of self-organized criticality is found in the focus of some types of 
epilepsies, where the seizure itself is the self-similar state. These observations 
are important in the development of models because they must be capable 
of replicating this behavior. Conversely these observations can help validate 
models. For example the integrate-fire models described in Chapter 6 have 
been shown to exhibit SOC-like behavior [197]. 

So, regardless of the presence of SOC, systems can be scale-invariant (and 
therefore self-similar) in both the spatial and temporal domain. How can the 
scale-invariance be identified? From here on we will concentrate on temporal 
scale invariance because of its relevance to later sections. The concepts pre- 
sented here are applicable also to statistics that can be extracted from spatial 
information. 

Implied by temporal scale-invariance is that short/small events occur fre- 
quently and long/large events occur infrequently, but with non-negligible 
probability. If an arbitrary function f(t) describes a temporal scale-invariant 
process, then the log-log plot of f(t) versus time t is a straight line. That is, 


log(f(t)) = —Blog(t) + c, (7.2) 


or alternatively 


f(t) = explo) tT? = Ct". (7.3) 


This is known as a power-law with positive scaling exponent* 8 and pos- 
itive c = log C. B is the negative of the gradient of the straight line formed 
in the log-log domain. f(t) displays scale invariance because dilating or con- 
tracting time t by a constant k (that is, changing the temporal scale) simply 
multiplies f(t) by kf, also a constant. The power-law becomes 


log(f(kt)) = —8log(kt) + c = — 8 log(t) — 8log(k) + c, (7.4) 

and the dilation results in an additive factor of —8 log(k) in the log-log 

domain. The gradient/exponent of the power-law remains the same upon di- 
lation. 

The word “power-law” thus describes a particular relationship that may be 
present in an arbitrary function. By itself it does not mean much - events with 
random distributions that obey a power-law are easily generated. However 
under certain conditions if power-laws are observed in the statistics of a system 
then information about the underlying process, such as its predictability, may 
be inferred. 


^Here the negative sign in front of f is there to demonstrate a decay or negative gradient 
in the relationship. This is convenient for the analysis in this chapter but not necessary in 
the general sense of a power-law. 
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Scale-invariance is an example of self-similarity, in which the 
statistical properties at large scales are the same as those at 
small scales. Scale-invariance may be temporal or spatial. Self- 
organized criticality is the ability of a system to, without ex- 
ternal input, make its dynamics scale-invariant when a critical 
stage is reached. 


If the log-log plot of a function f(t) versus time t is a straight 
line with gradient 8, then the process is said to obey a power- 
law, and it is a special cases of a scale-invariant process. 


We now introduce the concept of short-range versus long-range dependence. 
The autocorrelation of f(t) (defined in Equation 3.14 in Chapter 3) describes 
how much events at different times depend on one another. If the auto- 
correlation decreases very fast then current events can say very little about 
future events. This is known as short-range dependence (SRD). The most 
trivial example of an SRD process is an independently drawn random variable 
where the auto-correlation for all values other than t = 0 is negligible. This 
is the shortest type of dependence, i.e., none at all. 

However, if the auto-correlation decreases more slowly then there is some 
information about the future in the present event. If the decay is particularly 
slow then the dependence spans far into the future. This is known as long- 
range dependence (LRD). Processes that exhibit LRD are thus said to have 
long memory. Formally, the area under the auto-correlation of an SRD process 
is finite, and the decay near zero is very fast, whereas the area under the auto- 
correlation of an LRD process is infinite. 

Determining whether a process is SRD or LRD can be done by examining 
the power-laws that exist in the statistics of collected data, so long as it is 
stationary. The amount of dependence in a process can be characterized by 
an exponent a, with values 0 € a « 1. This exponent is calculated from the 
gradient 6 extracted from a power-law relationship. For the trivial (random) 
SRD process, a = 0. The closer o gets to 1 the longer the dependence and 
therefore the longer the memory in the process. Higher values of a represent 
smoother trends and less volatility. 

The values of a have been restricted to the range 0 < a < 1 so that they 
only test for LRD. Furthermore, this restriction implies that if a process is 
scale-invariant then it is also LRD, and vice versa [130]. This is not true 
in the more general sense. Values outside this range from observed data are 
indicative of non-stationarity, noise or the existence of more complex dynamics 
not described by the presence of LRD in the data. 


5The relationship between a and 8 depends on the method of extracting the power-law. 
More on this in Section 7.2 
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The exponent a is often re-expressed in terms of the Hurst exponent H, the 
“index of dependence", which can in turn be directly related to fractal dimen- 
sions such as those described in Chapter 3. So as not to confuse matters, the 
relationship between H and a is assumed to be a = 2H — 1, so that H — 0.5 
indicates a random distribution (i.e., no memory), and values of 0.5 « H «1 
are indicative of LRD. Since a and H can in any case always be expressed in 
terms of each other only a is used from now on. 

Is LRD a desirable property? 'There are two ways in which the existence 
of LRD may be interpreted: 


1. LRD is bad news: The presence of long memory creates many prob- 
lems in analysis, especially for short data sequences. The structure of 
the underlying system is complex and difficult to analyze because statis- 
tics computed from data whose probability distribution obeys a power 
law require (an often prohibitively) large number of datapoints to be 
accurate. The observation time must be long enough so that at least 
some of the infrequent events occur. 


2. LRD is good news: At least there is some structure in the system. 
Consider measurements that look random, but are in fact LRD. For a 
completely random system the acquisition of more data does not make 
the system more predictable because no additional information is gained 
about the future. However, because memory exists in an LRD process 
the longer the period of observation the more predictable that it be- 
comes [130]. In theory it is possible to reduce the prediction error to an 
arbitrarily low level by increasing the observation time accordingly. In 
practice, of course, this is often not feasible. 


The information in this section can be used to determine the predictabil- 
ity of seizures. Data that consist of only the times at which epileptic events 
start were collected from different patients spanning different intervals of 
time. These events look random, but may in fact be correlated. Deter- 
mining whether this system has memory, that is, identifying the presence 
of long-range dependence, could mean that these events are predictable. If 
no memory exists then this type of data cannot be used to predict seizures, 
although other forms of data may still be usable. 


In summary, if a power-law exists in the statistics of a function 
f (t) then it displays scale-invariance. The gradient of the power 
law, 8, can be used to determine if there is memory in the system 
or not by computing the scaling exponent a. The relationship 
between o and 8 depends on the method of computing the power 
law. 


If a = 0, then there is no memory in the system. If 0 < 
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a « 1, then long-range dependency or memory exists in the 
system. Memory or LRD can mean that a system becomes more 
predictable with longer observation time. This information is 
important in discerning how predictable epileptic seizures are. 


Robust methods to estimate o are described next. 


7.2 How to Estimate LRD 


The clinical data used for analysis (described in detail in Section 7.3) are in 
the form of discrete time points at which epileptic events start. The strength 
or duration of the events is often not known. This is called a point process, 
that can be equally expressed as either discrete times at which events occur or 
as the inter-event times. The inter-event times of these datasets look random 
in nature. 

This section illustrates the process of robust estimation of long range de- 
pendence and/or scale-invariance through the use of well-known generators of 
both LRD and random (trivial SRD) processes. These methods can then be 
applied to the clinical data. To generate point processes samples are drawn 
from well-known probability distributions, with and without memory. 


7.2.1 Example Distributions 


For a random process, the most widely used distribution is the Gaussian or 
Normal distribution, with probability density function (PDF) 


Pgauss(£) = EL. exp (£) (7.5) 


The probability of x lying in the range (m,n) is given by 


Paauss(x € (m,n)) = a Paauss(x)dz. (7.6) 


Equation 7.5 is a model of a probability distribution. If a data sequence of 
length N obeys this distribution then empirically, and for a sufficiently large 
number of events, the sample mean approaches u and the variance approaches 
g’: 

To simulate a Gaussian process we draw or generate a sequence of i.i.d. 
(independent and identically distributed) random samples X[n] using this 
distribution, for n = 1,2,---,N trials. Point processes such as the ones 
in the clinical datasets can only take positive values, so negative values are 
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rejected. Results are not affected by these transformations. These positive 
values are then representative of the inter-event times. An example time 
series drawn from PaAuss(r) with mean u = 3.5 and variance c? = 1 (both 
in arbitrary units) is shown in Figure 7.3(a). Provided that the samples are 
drawn independently then there is no correlation and thus no memory in this 
time series. This is the example used for the trivial SRD process with expected 
a — 0. 

A stochastic LRD process is more difficult to simulate. A well accepted 
example is Fractional Gaussian Noise (fGN), derived from Fractional Brow- 
nian Motion (fBM, also known as a random walk process). fBM is a random 
discrete time series B4[n] with n = 1,2,--- , N. If the path is constructed as 


1 n 
Baln] = — $ XTi, n=1,2,---,N, (7.7) 
i=0 


where X [n] are randomly generated samples, and the distribution follows 


a 


Bo[N] ~ N^?  X[1], (7.8) 


where ~ denotes an equality in distribution, then B4[N] is Brownian mo- 
tion with scaling parameter a. The properties at different scales (i.e., different 
N) scale according to a scaling exponent a. If X[n] are i.i.d. variables drawn 
from Gaussian distribution described by Equation 7.5 then fGN is defined as 


F,[n] = B4[n] - Balin - 1], n = 2,3,4,- , N. (7.9) 


The distribution of Fẹ follows that of a Gaussian, with variance propor- 
tional to the delay between samples (in this case 1) to the power of a + 1. 
When o = 0 then the elements in Fa=o are independent, and the random 
walk Ba=o0 is truly random. With 0 < a < 1 the system experiences memory 
in that if an increment in a particular direction occurs, then it is likely that 
the motion in Ba continues in this direction. The larger the a, the greater 
memory this process exhibits. 

fGN with only positive values can be interpreted as inter-event times. Fy 
is used here as the example of an LRD process. An example of fGN and fBM 
with a = 0.8 and N = 1000 is shown in Figure 7.3(b) and (c) respectively. 
Note that fGN looks random, even though slightly different than the Gaussian 
process in (a). fBM, on the other hand, displays more structure. 

The probability density function (PDF) of our test signals is shown next 
to each simulated time-series in Figure 7.3, and is compared to the expected 
distribution drawn in a solid line. They are derived as explained by Equa- 
tion 3.42. Notice the much greater variability in the case of fBM relative to 


8Better probability distributions that generate discrete and positive point processes exist 
(e.g., the Poisson distribution), but no analogous distribution exists for an LRD process. 
This makes comparisons between findings more tedious. In any case, all observations made 
about the random case in this section are the same regardless of whether samples are drawn 
from a Gaussian or a Poisson distribution. 
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the Gaussian and fGN PDFs. These last two converge very quickly to their 
expected distributions. In contrast the PDF of fBM is very volatile. This 
variability is a consequence of the LRD present in this data: many more data 
points are needed for convergence to the expected distribution. LRD processes 
are notoriously difficult to simulate for this reason. 

The existence of scale-invariance is not evident at face value in the simu- 
lated fBM and further processing is necessary. 


7.2.2 Computing a 


The remainder of this section shows how o can be estimated from sequences 
simulated according to the above two distributions (Gaussian and fGN). The 
random Gaussian distribution or trivial SRD case is referred to from now on 
to have probability density function P;sgp(r). The fGN is referred to have 
PDF Prnp(x) with a = 0.8. We perform analysis on data sequences of three 
lengths: N = 500, N = 1000 and N = 10000 events. 

The first step in determining LRD is to check whether scale-invariance is 
likely. The easiest way to do this is to derive inter-event probability histogram 
(IPH). They are calculated as the probability that an event of a particular 
length, in this case the inter-event times, has of occurring’ [7]. For data 
from an unknown distribution this requires knowledge about the maximum 
resolution. In the case of the processes drawn from P;sgp(x) and Prrp(x) 
the maximum resolution is known and the IPH can be drawn. 

'The IPH is drawn for the simulated point processes of different lengths in 
Figure 7.4(a) and (b), where the latter is now the PDF of fBM. The histograms 
are presented in a log-log plot to show that the probability of large events for 
P,srp(x) is very low. It is shown next to the expected distribution and a 
very fast (exponential) decay occurs at large events. In contrast (b) shows 
that large events occur at non-trivial probability for Pr, gp (x), independent of 
the number of samples N used. The histograms are drawn next to a reference 
exponential decay that has best fit at small event durations. Long event 
durations do not decay exponentially as in (a). This slow decay is sometimes 
referred to as a heavy tail (although more formal definitions exist) for which 
long/large events occur at non-negligible probabilities [130]. 

From the purely mathematical point of view there is no reason why a heavy 
tailed distribution has to be LRD, but in processes derived from dynamical 
systems heavy tails generally lend themselves to the existence of long memory 
[130]. Although a heavy tail does not necessarily indicate the existence of LRD, 
LRD cannot exist without one. This first step is a simple way to determine 
whether it is worth continuing with analysis. For example, the process gener- 
ated by Pisrp(«) can be rejected, whereas the process generated by Pr gp(x) 
requires further analysis and validation. 

Traditional methods of estimating the fractal exponent a include the iden- 


T Again, probability histograms are calculated as in Equation 3.42. 
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(b) Fractional Gaussian noise (fGN) 
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(c) Fractional Brownian motion (fBM) 


FIGURE 7.3: Simulated time series for (a) Random (Gaussian) process, (b) 
Fractional Gaussian Noise (fGN) with a = 0.8 and (c) the consequent Frac- 
tional Brownian Motion (fBM). Upon visual inspection both the (a) and (b) 
look predominantly random even though some structure exists in fGN. Next 
to each figure is the probability density function (PDF) of the simulated time- 
series. (a) and (b) have similar PDFs that converge to their expected distri- 
butions quickly. The PDF of fBM, on the other hand, is more volatile and 
does not converge to the expected function. This is because the existence of 
long memory, brought about by the introduction of the parameter o, makes 
the convergence time of the simulated time-series much longer. This is the 
problem when long memory exists: many data points are necessary to provide 
conclusive results. 
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FIGURE 7.4: Inter-event probability histograms (IPH) drawn on log-log plots 
for both the random (o = 0) and the LRD (a = 0.8) simulated time series. (a) 
shows that the IPH of random events falls exponentially so that the probability 
of large events is negligible. (b) shows that the IPH for an LRD process has 
large events that occur at non-trivial probability. This is known as a heavy 
tail, a property that is necessary for memory to exist. 


tification of power-laws constructed in a variety of ways. These methods are 
described in great detail in references such as [18], [36] and [130], and typically 
involve higher order statistics (e.g., variance and higher) because first order 
statistics (e.g., mean) do not reveal the differences between random and LRD 
processes [130]. Variance is the most often used second order statistic. Other 
typical methods include: 


e Auto-correlation: Defined in Section 3.3.1 in Chapter 3. The auto- 
correlation not only gives an idea of how far in time there is non-trivial 
coupling, but can also be used directly to estimate a. If a power-law is 
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evident in the log-log plot of absolute auto-correlation magnitude versus 
delay then its gradient 6 = 1 — o [7]. 


e Power Spectral Density (PSD): Defined in Section 3.3.2 in Chapter 3. 
Since the auto-correlation can be related to the PSD, it follows that a 
can also be estimated by analyzing the frequency content of the signal. 
Long memory relates to low frequencies (and slow time scales). The 
power-law generated at low frequencies in the PSD scales as 8 =a. 


e Fano Factor: This is an estimation of the variance in the number 
of events observed in a time period T. Each T is representative of a 
different time scale. The gradient of the Fano factor scales as 8 = a+1 
for increasing T. In the random case, the Fano factor does not depend 
on the duration of the observation, resulting in an 8 = 1 (a = 0) as 
expected. Any deviation from 8 = 1 indicates a complexity or richness 
of information not present in a process with no memory [7]. 


The above examples are all simple ways of estimating the scaling exponent 
a, related to 8 in different ways depending on the way that the power-law is 
drawn?. However although these methods work well under optimal conditions 
they are each wrought with their own idiosyncrasies for data series that are 
short, noisy and not stationary — results are difficult to interpret. 

Wavelet decomposition is used as an alternative. Wavelets work well be- 
cause they are themselves scale-invariant processes, a property that is not 
true for other analysis tools [130]. They have proved effective in detecting 
the existence of many different types of long memory. Under ideal conditions 
they are not better at estimating a than previously mentioned methods [7], 
but they have the advantage of coping well with non-stationarity, provided 
that the changes are smooth enough [7, 36, 130]. 

Wavelets have been described in detail in Section 3.3.3. T'hey isolate activ- 
ity at different frequencies and time scales, and, as such, they are intuitively 
related to the estimation of œ through PSD based methods, where larger 
scales correspond to lower frequencies?. If an orthogonal mother wavelet is 
used and only dyadic sampling is allowed (that is, let dilation factor am = 2™ 
for m — 1,2,3...) then the wavelet coefficients between scales are only weakly 
correlated. Thus the analysis at each scale is largely de-coupled from all other 
scales [130]. 

The variance sm of the wavelet coefficients d[m, |] at scale m is defined as 


*'The estimated value of a may not be the same as the true scaling exponent. It is the 
aim of this work to find out to what confidence the value of a may be representative of the 
true value. 

9In fact, if the DB-2 (also known as the Haar wavelet) is used then the analysis can be 
equivalent to the Fano factor calculation because they both measure variance at different 
time scales in similar ways. The difference is that the Fano factor does not handle the 
correlation present between scales very well, whereas the wavelet methods do, provided 
that sampling is dyadic or exponential (am = i for any number i) [130]. 
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L 
1l c 2 


where there are Lm coefficients at each scale m. This is a second order 
statistic that can be used to estimate o. Because each value of m represents 
an exponential increase in bandwidth as well as center frequency, plotting 
logs(Sm) versus m is the equivalent of a log-log plot (also known as a scalo- 
gram) used to identify the power-law. In reality a small bias gm needs to be 
introduced because of the non-linear nature of the logarithm, so that the plot 
versus m becomes 


Um — loga(s,,) — Jm. (7.11) 


A good approximation of gm when Lm is large enough and under the 
simplifying assumption that coefficients at different m are completely de- 
correlated is given by [36] 


1 
L4,1n2? 


where ~ denotes an approximate estimate. 

To detect scaling using ym versus m a region of alignment must be iden- 
tified in this plot. Not all scales need to be involved in the long memory 
process, but if a sufficient number of scales line up then a power-law, which 
may then be tested for LRD, is evident. In practice the region of alignment 
must consist of at least 4 dyadic scales. This number is somewhat arbitrary, 
seeing as memory could exist in as little as 2 scales, but it is entirely too easy 
for up to 3 scales to line up randomly. A minimum of 4 scales is necessary 
before results are taken seriously [130]. To test for LRD the gradient at the 
region of alignment can be equated to the scaling exponent, that is, 8 = o. 

In reality the points are unlikely to line up perfectly. However the quantity 
Ym is only accurate to a certain degree of confidence dependent on the number 
of available coefficients. The variance sm is a statistic and more data make 
its estimate more accurate. Since L,, halves at each iteration m the errors 
expected at each scale increase monotonically with larger scales. An estimate 
of the expected variance o2, in Ym can also be used to compute error bars, an 
approximation of which is given by 


Im © (7.12) 


3 2 
OL, & —. 
um 
Error bars are be drawn on ym as multiples of om for different confidence 
levels. More data increase the number of coefficients Lm at each scale m. Thus 
there is a decrease in the variance om and the error bars get smaller. A larger 
L4, also makes more scales available for analysis and results are generally 
more conclusive. A small dataset can lead to problems that imply that (1) 
the observation time may not be sufficiently long for LRD to be detectable, 


(7.13) 
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or (2) error bars become large enough so that LRD cannot be distinguished 
from random. In (2) the method is incapable of rejecting either possibility. 
Longer time series do not remove but do reduce the chance of these limitations 
becoming significant. 


7.2.3 Simulations 


To apply the above theory on our distributions P;sgp(r) and Prrp(«), the 
following steps are taken: 


1. Generate inter-event sequences: This is done using Equations 7.5 and 
7.9 respectively for a random (trivial SRD) and LRD process, ensuring 
that all values are positive. Three sequences of length N = 500, N = 
1000 and N = 10000 are generated for each case. 


2. Create time-series: A continuous-time process is approximated by gen- 
erating a time-series in which a zero indicates no event, and a 1 indicates 
an event 1°. In this case, values were arbitrarily assigned a resolution of 
the minimum inter-event time observed in generated sequences. Using 
a finer resolution does not affect the estimates (see later analysis). 


3. Use wavelet tools: Calculate ym and om for as many scales as are 
available!!. Although the number of scales depends on the resolution of 
the time series created in step 2, the number of dyadic scales available 
for N — 500 is roughly 1 less than for N — 1000 and 4-5 less than for 
N = 10000. 


4. Calculate a: Select the interval over which, within confidence limits, 
points align in the scalogram of y,, versus m. Compute o using a line 
of best fit. A x? goodness of fit test is computed to determine how 
well the data fit this line. A confidence level Q is provided to evaluate 
goodness-of-fit. A value of Q > 0.05 is deemed an acceptable fit. For 
the readers not familiar with x? goodness-of-fit tests, this is explained 
in [86]. 


5. Distinguish LRD: Determine if the value of a falls within values indica- 
tive of LRD. If not, reject as noise. 


10This step is performed because inter-event times are inherently discrete-time processes, 
whereas the theory presented is relevant to continuous time only. Note that a way to filter 
the discrete-time process so that methods can be applied directly to this data exists. Details 
on this can be found in [36] and [130], but results in this chapter were not significantly 
affected by the choice of method. 

llThe basis for the code used in this analysis is freely available for download 
on http://www.cubinlab.ee.unimelb.edu.au/~darryl/secondorder_code.html for non- 
commercial purposes. The authors, Patrice Abry and Darryl Veitch, retain copyright of 
this code. 
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FIGURE 7.5: Wavelet estimation tools are shown to reliably estimate a at 
large scales for both the random and the LRD processes, even for simulations 
involving relatively low number of events. The figures show the power-law 
relationships formed between ym and m. The expected value of a = 0 for 
the random case and o = 0.8 in the LRD case is included in each plot for 
comparison. 


7.2.4 Results 


Figure 7.5(a) and (b) show estimates for the trivial SRD and LRD process 
respectively. The expected a = 0 for SRD and a = 0.8 for LRD are also 
plotted for comparison. In both cases even for relatively low number of events 
the limiting value of the gradient of ym versus m approaches the expected a 
within error bars that increase with m. All analysis here is performed using 
a DB-3 wavelet (see Chapter 3). 

How robust are the estimations of LRD? Several factors can affect the 
calculations: choice of resolution in step 2, noise in the data, and choice of 
wavelet type and order in step 3. We explore how each of these affect the 
estimates. 

First, in Figure 7.6 step 3 is applied to the LRD process for different 
resolutions, ranging from low (smaller than the smallest inter-event time) to 
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FIGURE 7.6: LRD calculation for changes in resolution. Changing the res- 
olution at which a is estimated does not affect the gradient at large scales, 
even though the raw values of ym do change. Thus although it is not valid to 
use a resolution higher than that used in the recording process, so long as the 
selected resolution is reasonable analysis can continue with little thought to 
its impact on estimates. This is in stark contrast to traditional estimators of 
a that are very volatile under such design choices. 


high (larger than the smallest inter-event time) in arbitrary units. The coarser 
the resolution (higher value) the fewer scales that are available for analysis. 
The values of Ym themselves are affected by the choice of resolution, but the 
gradient a at large scales is not. This again is a result of the independence 
of wavelet coefficients at different scales. Using a resolution greater than that 
of the recorded data, although in general meaningless from an analysis point 
of view, does not affect the results significantly. Furthermore using coarser 
resolutions also does not affect results, and since coarser resolutions may be 
interpreted as greater tolerance to noise the wavelet based estimates are fairly 
robust to noisy conditions. As long as the resolution used is within the ball- 
park figure of the true resolution, wavelet estimation will not be adversely 
affected by this choice. 


How are estimates affected by missing data? Or rather how is the analysis 
affected by removal of events? 'To answer this, step 3 was computed for both 
random and LRD processes with events randomly removed at different rates 
[67]. The results are in Figure 7.7, and they show that even for large removal 
rate of 0.7 (that is, 70% of the data missing) the trends at the larger scales 
remain relatively unaffected. Given a fixed recording epoch the uncertainty of 
results grows as more data go missing, i.e., there are fewer scales available for 
analysis. Of course if you increase the observation window this effect would 
disappear. 


However if we are selective in the types of events that are removed by 
targeting only the long events, the values of ym at larger scales m, are affected. 
This is also shown in Figure 7.7 for both LRD and random processes, and it 
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FIGURE 7.7: The estimate of a in an LRD process is robust when events 
However, the targeted removal of large events 
destroys this structure because LRD is dependent on their inclusion. This is 
demonstrated for both the random and the LRD process above. Neither the 
random nor the selective removal of long events changes the estimate of a = 0 
On the other hand the structure at 


are missing from datasets. 


for the random process, as seen in (a). 


large scales is shown to suffer for the LRD process in (b) when long events 


are not present. 
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FIGURE 7.8: Wavelet based tools for the computation of o are robust under 
smooth changes. In (a) the mean of an LRD process with a = 0.8 is modulated 
with a smooth polynomial. Though the actual values of ym change when 
compared to the original, the estimated gradient o at large scales does not, 
even for relatively low wavelet orders. Wavelets are also shown to be robust 
under changes in the variance of data, as shown in (b) where a sharp transition 
is observed half way through the sequence. The ability of wavelets to cope 
with such changes make them more suitable for data such as the one described 
in Section 7.3 because it is expected to contain such transitions. 
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occurs because the properties of scale-invariance for larger m are dependent 
on the longer events even though they occur at lower probabilities. 

Non-stationarities in the data may also affect the estimation of a. If there 
are reasons to believe that the data are stationary, as is the case for our 
simulated sequences, then the estimations of a should be taken seriously [130]. 
However real data can be affected by non-stationarities. Traditional methods 
of estimation of LRD could not be used to analyze these cases because they 
cannot cope with changes in stationarity. Wavelet tools can, provided these 
changes are smooth enough. This is shown in Figure 7.8. In (a) the mean of 
the events is changed (smoothly) by modulating the original time series by a 
polynomial trend. It is seen that although the values of ym are affected (as 
compared to the original data), the a estimated from the gradient at large m 
is not. Changes in variance of events are abruptly introduced in the second 
half of the sequence in (b), and again the estimate of « is unchanged. 

'These observations are supported by theory. The order O of the Daubechies 
wavelet is capable of removing polynomial trends of order up to O — 1 [7, 130]. 
For example the DB-3 (O — 3) wavelet, which has been used in all analysis 
so far, is capable of removing quadratic trends. Increasing the order increases 
the capabilities of coping with less stationarity. The choice of wavelet family 
in this type of analysis is not as important as the choice of wavelet order, 
so long as the wavelets used are orthogonal. Since wavelets are naturally 
equipped with the ability to change order quickly and easily once again they 
are superior to traditional estimation methods. 

In practice the selection of the wavelet order should depend on the data 
because the smoothness of the non-stationarities is unknown. The order of 
the wavelet is systematically increased until stable results are observed, and 
the lowest order wavelet is used. Low order wavelets are desired because less 
data are necessary. The tradeoff is between the number of scales available for 
analysis versus higher order wavelets that can cope with more complicated 
forms of non-stationarity. Figure 7.8(a) shows that the introduced trend is 
smooth enough so that selection of higher order wavelets does not change 
results. 

Finally, because for real data it is not known whether the non-stationarity 
is smooth enough changes over time should be checked for less obvious volatil- 
ity. Àn accepted method is to break up the data into segments to determine 
if the values of o are consistent over time. This should be done for any es- 
timation method used, not just wavelet based tools. This step is not shown 
here for the simulated sequences as it is known that these are stationary. 


Wavelet tools described in Section 3.3.3 in Chapter 3 are 
effective in estimating a by using the variance of the wavelet 
coefficients at each scale m. Wavelet methods are robust with 
respect to: 
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1. Resolution/Noise: The choice of resolution, and thus the allowed tol- 
erance for noise, does not affect the estimate of a, only the number of 
scales available for analysis. 


2. Missing Data: Even with a relatively large number of missing events 
(i.e., events that have gone unrecorded) estimates of a using wavelet 
methods are not affected. 


3. Non-Stationarity: A wavelet of order O is not affected by non-stationarity 
of polynomial trends of order less than or equal to O — 1. 


The ability for wavelets to cope with non-stationary data as well as the flex- 
ibility in the parameter choice O that allows different levels of non-stationarity 
makes wavelets much more powerful than traditional methods. 

Now that practical aspects of the detection of the presence of long memory 
have been discussed they can be applied to real data. This is done in Section 
7.4 and discussed in Section 7.5, but first a detailed description of the clinical 
data is provided in the next section. 


7.3 Seizure Frequency Dataset 


A summary of the epilepsy data available for analysis is in Table 7.1. There 
are 6 datasets of the times at which epileptic events occur. Each represents 
a different subject (person or rat). Inter-event times can be extracted and 
time-series generated with a maximum resolution shown in Table 7.1. This 
value refers to the best resolution that the events were reliably recorded at by 
the patient or at which the events were extracted from an EEG time-series. 
The resolution at which analysis was performed may differ from this value. 
Specific numbers are given in Section 7.4 but should never be better than that 
shown here. The various time series are plotted in Figure 7.9. 

There are two types of data. Datasets 1-4 are records of epileptic events 
maintained by the patient him-herself over a period of 1-25 years. These 
records are prone to much noise — faulty memory may lead to erroneous event 
times; lapses in discipline may mean some events are missing; only clinical 
epileptic events are present since sub-clinical seizures are not detectable by 
the patient; drug dosages usually change over many years, perhaps affecting 
the stationarity of event frequency. Nevertheless this is the best data of this 
nature that we are ever likely to have, seeing that EEG monitoring over long 
time scales is not feasible. Analysis of such data may be helped by using 
coarser resolution than that shown in Table 7.1, as well as by removing time 
periods in which lapses in the record-keeping occur (these usually have been 
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Frequency of Epileptic Events — Dataset Summary 


Dataset Data Source Duration Sz. # | Resolution | Comments 
1 Human 25 years 911 1 day Many drug dosage changes 
(diary entries) | (1982-2006) 
2 Human 10 years 1050 1 day Seizures occur in clusters 
(diary entries) | (1997-2006) (665 clusters) 
3 Human 5 years 412 0.5 days Drug changes unavailable 
(diary entries) | (2001-2006) 
4 Human 1 year 397 15 mins Drug changes unavailable 
(diary entries) | (2005-2006) 
5 Human 4 days 2722 1 secs Subject experienced an 
(EEG) unusually large number 
of epileptiform discharges 
6 Rat 3 hours 1429 0.5 secs Spontaneous epilepsy 
(EEG) 


TABLE 7.1: Seizure frequency data. 


annotated by the patient). Furthermore, one can make the perhaps unjustified 
assumption that drug dosages change the mean frequency or variance of events 
but not the intrinsic relationship between events. The use of wavelet analysis 
that can remove the effect of such changes if they are smooth enough is then 
justified. 

Datasets 5-6 are shorter-term sequences extracted from EEG recordings. 
Dataset 5 belongs to a patient who experienced an unusually large number 
of epileptiform discharges in the EEG. These events are not necessarily clin- 
ical. That there are many events makes this 4-day EEG monitoring suitable 
for analysis of this type. Finally, Dataset 6 belongs to a 3 hour EEG record 
taken from a rat recorded in the Shanghai Institute of Brain Functional Ge- 
nomics (East China Normal University, Shanghai, China). During a surgical 
procedure in preparation for a different experiment the rat developed frequent 
spontaneous epilepsy. The records for the rat were taken in 1 hour periods. 
No drug changes occurred throughout the entire recording time. In Datasets 
5-6 it was also possible to extract the duration of the events as well as their 
times, so that the ‘strength’ of the events is known. 

The short duration of both these studies, as well as the extraction of events 
directly from EEG records, makes their integrity greater than Datasets 1-4 
because events are known to at most be very rarely missed. The stationarity 
in the epileptic discharges is also more likely. Non-stationarity may be expe- 
rienced in Dataset 5 because of the day-night changes, although this is shown 
not to affect results in the next section. 

One could argue that a typical epileptic patient is unlikely to experience 
this number of epileptiform discharges, or that the analysis on rat EEG data 
may not translate to human epilepsy. However the purpose of this chapter is 
to detect the presence of memory in the epilepsy, thus although the analysis 
may be relevant only to specific cases it could have greater implications in our 
understanding of epilepsy as a whole. 

In any case, all datasets are relatively short in the number of observed 
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FIGURE 7.9: The inter-event times for Datasets 1-6. The type of data for 
each plot is summarized in Table 7.1. Each of these events look random and 
non-stationary. The aim of the analysis here is to find some structure in the 
form of memory in this seemingly stochastic data. (Continued) 
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FIGURE 7.9: (Continued) 
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events. The computation of memory is limited by fairly large error bars, and 
care in the interpretation of these results is necessary. The analysis of this 
data, and their relevance to predictability, is discussed next. 


The available data consists of records of times at which epileptic 
events occur. This data can 


e Have a substantial number of missing events. 


e Be non-stationary due to changes in drugs, evolution of the epilepsy or 
changes in sleep/awake cycles. 


Missed events are not expected to be selective of length (i.e., 
both short and long events are excluded) and short-term non- 
stationarities due to day and night changes are expected to be 
smooth changes in variance and mean. Wavelet analysis is suit- 
able to detect the presence of LRD under these conditions. 


7.4 Analysis — Estimation of a 


Visual inspection of the data in Figure 7.9 does not reveal any obvious pattern 
in the epileptic inter-event times. The data looks random, even if the distri- 
butions are not clear and not necessarily the same for all datasets. The data 
also looks non-stationary — obvious changes in mean can be seen in Datasets 
1, 3 and 5, whilst obvious changes in variance can be identified in datasets 
2, 4 and 5. These changes may be due to mechanisms of the generators of 
epileptic activity in each case, external factors such as drug dosage changes, 
or in the case of dataset 5 may simply be the difference between day and night 
time. In any case the presence of these changes indicates that we should use 
wavelet analysis tools. 

Let us first see if long memory may exist in the data by drawing inter-event 
probability histograms (IPH) for each dataset. These can be seen in Figure 
7.10, in log-log plots, along with comparative exponential (fast) decay. In all 
cases the probability of large events decays slower than exponential, and in 
all cases it is also possible to identify a straight line that can be fitted to the 
IPH. Thus a power-law exists, and this heavy tail indicates that long memory 
may be present. Further analysis is validated for all datasets??. 


1?]n principle it is possible to use this power-law directly to estimate the scaling exponent 
a, but like other traditional estimation methods the results can vary greatly with choice of 
histogram intervals, and much more care needs to be taken in the interpretation of results. 
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FIGURE 7.10: Inter-event probability histograms (IPH) for each of the 6 
The probability of long inter-event times in all 
Moreover at least visually it seems that all these 
distributions obey a power-law because the data follow a linear trend (shown 
as a dotted line for reference). Because this power-law exists, further analysis 
in search for the existence of memory is justified. Exponential decay is plotted 
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FIGURE 7.11: Calculations of ym using Daubechies wavelets DB-3, DB-4 and 
DB-5 on Datasets 1-6. The above graphs validate the use of the DB-3 wavelet 
for further analysis because the results are stable under changes in the wavelet 
order. Using wavelet orders higher than necessary would reduce number of 
scales available for analysis, which is not desirable when the datasets are 
already so short. 
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Before estimating o, the analysis is quickly performed for different order 
O of the Daubechies wavelets. This is shown in Figure 7.11 for O — 3,4 and 5. 
In all cases the results are stable under the choice of O; thus the smallest order 
(O — 3) is chosen and the DB-3 wavelet is used throughout the remainder of 
the data analysis. 

Figure 7.12 shows the estimated o for all datasets. Each graph shows 
Ym versus m and the line of regression used to estimate o, as well as the 
range of m for which it was calculated. Also shown are the resolutions used, 
the estimated «o including error bounds and the goodness of fit parameter Q 
(greater than 0.05 in all cases, indicating a reasonable fit). 

Recall that a value of 0 < a < 1 at large scales indicates LRD, whilst a 
value of a = 0 indicates no memory. The problem is sometimes in selecting 
the correct region of alignment for which a should be estimated. Consider 
Datasets 4. Two regions of alignment are shown, one in which o z 0 and one 
in which o = 0.65. Should the larger range be chosen, because the result is 
more likely to be correct? What if the LRD is only present at larger scales, and 
it is only the un-availability of data that means that memory goes undetected? 
In both cases the previously mentioned minimum of 4 scales are used, thus 
both are valid observations. However when one starts examining the error 
bars it becomes clear that the a = 0.65, although suggestive of long memory, 
has error bars (+0.62) that span the entire range of 0 < a < 1. This not only 
means that if there is long memory its effect cannot be estimated, but also 
that randomness cannot be rejected because a result very close to a = 0 is 
possible. The error bars are important in determining the confidence to which 
results can be interpreted. 

Similar observations can be made about Dataset 2, in which several regions 
of alignment are possible. Initially it is tempting to say that there is evidence 
for scale-invariance between scales 4 < m < 7. However this dataset is known 
to have seizures that occur in clusters and the time scales at which these 
clusters occur correspond to these scales. It could be that within the clusters 
memory exists, and this is indicated by this region of alignment. However this 
cannot be used to infer long-range memory beyond these clusters because an 
a £: 0 is estimated for larger scales. It is important to use knowledge of the 
nature of the data or the data collection techniques when interpreting results. 
Although it seems that a region of alignment with non-zero gradient may 
develop at larger scales, this is only observed for 3 values of m, an insufficient 
number to be taken seriously. More data would be necessary to infer memory 
at longer scales. 

The remainder of the datasets seem a little more straightforward. An 
a = 0 is a logical conclusion for Datasets 2, 3 and 4, although in all cases 
neither the existence or lack of memory can be rejected seeing as the error 
bars resulting from insufficiently long data series span both the a = 0 and 
significant proportions of the 0 < a « 1 range. It does tell us that further 
analysis would be fruitless, and these data are rejected. In any case these 
graphs are important in pointing out the follies of using histograms such as 
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FIGURE 7.12: Calculations of a for Datasets 1-6. A clear trend is identified 


for Datasets 1, 3, 5 and 6, but the selection of an appropriate range to calculate 
a from is less clear in Datasets 2 and 4, where different ranges yield different 
gradients. In Datasets 2-4 the more likely conclusion is that a = 0, that is, no 
memory exists. However the small number of data points used in this analysis 
means that error bars are too large to conclusively reject the existence of 
LRD. Further analysis using this data is futile. Datasets 1, 5 and 6 on the 
other hand indicate that memory exists. Further validation tests are necessary 
before conclusions can be drawn. (Continued) 
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those in Figure 7.10 to infer LRD, as is done in much of the literature. In 
this figure a power law is clearly identified in Datasets 2, 3 and 4 and it is 
tempting to assume that if the computed exponent is correct, that this would 
be LRD. It was necessary to use the better defined wavelet tools to determine 
that error bars are too large to draw any conclusions. 

Datasets 1, 5 and 6 are a different story because the estimated o in all of 
them indicates LRD with errors that exclude the possibility of a = 0. It would 
be tempting to suggest that long memory exists in all of these, but first there is 
need to evaluate how robust these results are. A stationarity test is performed 
for each of these 3 datasets in Figure 7.13 by breaking down the time series 
into 3 consecutive segments and comparing the results of each segment to the 
originally computed a. This is particularly important for Dataset 1 which is 
known to be of a 25 year duration. Again by using knowledge of where the data 
come from we can see that a stationarity test is important because the data 
are likely to be affected by countless factors occurring over such a very long 
time. Figure 7.13(a) reveals that the skepticism is justified because there are 
large fluctuations occurring between each of the 3 tests. Even though the error 
bars are quite large and in most cases they overlap each other, this overlap is 
very small. Thus Dataset 1 is not very robust in informing us whether LRD is 
present. Pursuing analysis on this data is not likely to validate or in-validate 
the existence of LRD, even though the non-zero gradients in the scalograms 
shown in Figure 7.13(a) do indicate that LRD may exist over long periods in 
spite of the non-stationarity. 

This is not so for Datasets 5 and 6. The stationarity test shows that 
similar results are observable in all segments, with error bars spanning largely 
the same space in all cases. This suggests that the assumption of stationarity 
is valid and results can be taken seriously. Further robustness tests are seen 
in Figure 7.14, where events were removed at random at different removal 
rates. No change is observed in the estimate of a. This test is performed 
because even though both these datasets are less likely to contain missing 
events than Datasets 1-4, EEG records were marked manually and errors are 
possible. This gives further confidence in the existence of long memory. For 
completeness, a comparison is made in this same figure to the case in which 
only long events are removed. Visible changes in the characteristics of the 
calculated ym, as well as the corresponding gradient, are observable in both 
cases. Long range memory is destroyed by the removal of these events. 

Furthermore we can infer the length of the memory by relating the scales 
m for which LRD is identified to its temporal scale equivalent. We know the 
resolution and we know the number of events, thus we can say that Dataset 
5 identifies memory in the (conservative) range of 16 minutes to 8 hours, and 
Dataset 6 in the range of 1 to 16 minutes. Longer memory may be present 
but we do not have access to more data. 

It is important to note that the presence of LRD cannot be rejected for 
Datasets 1-4, but neither can a pure noise model. The errors caused by 
insufficient data (because events are rare) are large and make this type of 
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FIGURE 7.13: Stationarity analysis for Datasets 1, 5 and 6. Datasets were 
partitioned into 3 consecutive blocks and compared to the initial estimate of 
a as shown in Figure 7.12. The above shows that Datasets 5 and 6 are likely 
stationary because each partition gives an a similar to the original calculation. 
Dataset 1, on the other hand, is more volatile and stationarity is unlikely. 
'This is expected because Dataset 5 and 6 span only hours and days, whereas 
Dataset 1 spans 25 years over which many changes in the characteristics of 
the epilepsy likely occur. Because of this lack of stationarity further analysis 
is unlikely to yield conclusive results, and thus Dataset 1 must be rejected as 
a candidate for determining LRD. 
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FIGURE 7.14: An estimated a for an LRD process is robust under random 
removal of events. This is shown to be the case for both Datasets 5 and 6, even 
for relatively large removal rates. This is not the case when removal of large 
events is targeted, where the structure at large scales is partly destroyed. The 
above figures support the existence of LRD in Datasets 5 and 6. The original 
estimated a is shown in all plots for comparison. 


data unsuitable. The analysis in this section can validate an o indicative of 
LRD for Datasets 5 and 6 only. It is interesting that in both these datasets, 
which were extracted from EEG, the length of the events is known and that 
these lengths also follow a power-law indicative of long memory. This can be 
seen in Figure 7.15. Although possibly useful for modeling purposes in other 
studies this information does not add any value to the determination of how 
predictable seizures are, and is only included here for interest's sake. 


An o value consistent with long memory up to hours long has 
been found in 2 out of the 6 datasets. Memory may exist in 
the processes described by the other 4 datasets, but the data 
are unable to confirm or deny this, either because the nature of 
recordings is inadequate, because clinical events are not frequent 
enough to provide sufficient amounts of data or because memory 
simply does not exist. It was wavelet tools that allowed these 
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observations to be made; other tools such as IPH can (more 
easily) lead to the wrong conclusion. 


'The implications of the discoveries listed here are discussed next. 


7.5 Memory and Predictability of Seizures 


What do these findings tell us about how predictable seizures are? The fact 
that o was indicative of long memory does not make the underlying process 
a long memory one. The results simply tell us that the data are consistent 
with a model in which LRD is present, or rather a short-range memory model 
is excluded. It is always possible that the structure discussed in Section 7.4 is 
due to something else entirely. However, since results have proved stationary 
over time and because of the nature of the data they do tell us that a purely 
random model of the generation of epileptic activity is not suitable. A richness 
of structure exists that may suitably be modeled by stochastic processes with 
long memory, but the existence of another model equally capable of describing 
this phenomena is not ruled out [130]. 

Furthermore, even if the generation of events is due to long memory, the 
results do not in any way tell us what physical mechanisms are responsible for 
the existence of this structure. Is it a phenomenon brought about from the 
neural network architecture in the brain? Perhaps it is due to sensory input? 
More likely it is to do with both of these, and more. Thus the existence of the 
scale-invariance has implications for the types of models chosen to simulate 
brain behavior — they need to be capable of reproducing this type of behavior. 
It should be kept in mind that the models presented in Chapter 6 do not 
contain any mechanisms in which memory longer than a few milliseconds is 
possible. If these local models are accepted as suitable for small scales then 
the development of global models needs to somehow incorporate it. 

If data from this chapter were used to develop these physical models then 
it is only the short-term studies acquired from clinical records (e.g., datasets 
5 and 6) that are usable. In long term qualitative studies the number of 
events is small and records are unreliable under stationarity tests. This is 
both good news and bad: short-term records are more easily maintained, but 
most epileptic patients do not experience a suitably large number of epileptic 
events in these time frames. Thus if prediction models are derived from this 
type of data there is a limit in the type of patients that can be helped, once 
again emphasizing the patient specific nature of a likely predictor. In any case 
even if these short-term datasets cannot be used directly to predict seizures 
they can be used to provide insight and ensure that physical models (of a 
larger scale than those in Chapter 6) are consistent with LRD. They can then 
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FIGURE 7.15: The length of the epileptic events, which are available because 
data were extracted from EEG records, is shown to also follow a power-law 
indicative of memory in this relationship as well. The above is not directly 
applicable to the determination of whether epileptic events are predictable 
or not, but is of interest in the development of models that must be able to 
replicate this behavior. 


be applied to the problem of prediction, an improvement on the black box 
predictors available today. The relationships in the length of events briefly 
introduced in the previous section may also be used to this end. 


Another implication on the development of suitable large-scale physical 
models is that the dimension N of the state space in Equation 3.2 must be in- 
credibly large, in fact infinite as a worst case scenario. While this is consistent 
with the architecture of the brain, known to be very complex, it implies (as 
does LRD) that prohibitively long data records are needed and this forebodes 
bad news for the ability to predict seizures at all. 


In any case the aim of this chapter is not to develop these models; it is 
simply to identify the existence of scaling to try and understand if seizures are 
predictable. The results show evidence of long range memory in the system. 
'This is apparent for studies that span hours and days, but not months or years. 
Perhaps when one thinks of the capabilities of the brain it is not surprising 
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that such memory is possible in the brain — what is surprising is that this 
memory exists between epileptic events, and that it is not short. 

For prediction, is this type of memory a good thing or a bad thing? The 
fact that memory exists at all in epilepsy, beyond very short times, is a positive 
discovery: at least seizures could be predictable! If no memory was observed 
then no clear evidence exists that seizures are at all predictable. Because event 
times are not random information about the timing of a current and past 
seizure does contain information about future seizures. The more information 
of this type that is available, the more predictable seizures may be. 


This is a simplistic argument that holds only for cases in which many 
epileptiform bursts are observed in short enough time period so that data 
are roughly stationary. Perhaps this study only tells us what type of data 
are unsuitable for prediction. Global events measurable by scalp or intra- 
cranial EEG are rare and using such measurements is not wise. Predictors 
should focus on scales at which epileptic events occur more frequently. For 
example, if this type of memory also exists in local discharges from focal 
lesions, where events occur much more frequently and more data are available, 
then a measurement at this scale could be more suitable for prediction. Intra- 
cranial records using micro-electrodes that detect activity at these scales exist 
but have to date largely been ignored in the problem of seizure prediction. 
An exception to this is the detection/intervention approach taken in [14]. 

That this memory appears to be so long (up to 8 hours!) casts further 
doubt on how current predictors of epilepsy work because they typically de- 
pend on short-term EEG, often restricting themselves to inter-seizure EEG. 
Of course the type of data they are looking at is different than the one here 
and perhaps it is sufficient to achieve success, but given their limited perfor- 
mance it still seems unlikely that this information can be ignored. Why don’t 
current techniques look at a few days rather than a few hours? In theory 
better baselines can be established in this way, although in practice such long 
sequences of data cannot be processed easily and the types of features that 
are extracted are already too computationally demanding. 


The identification of scaling regions in data spanning time scales 
of up to 8 hours, possibly more, implies that the generation of 
the next epileptic event relies in part on events that occurred 
a long time ago. This means that to predict seizures it may 
be necessary to include data representative of these time scales, 
and it indicates a need to use observations other than the EEG. 
Current predictors of epilepsy do not utilize such long memory. 


However, the fact that memory is likely gives hope that 
seizures are predictable. Future efforts should move toward the 
development of physical rather than black box models consistent 
with the existence of LRD discovered here. 
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7.6 Conclusions 


If prediction algorithms are likely to ever evolve beyond black box methods 
then we must look past the traditional use of EEG records, which are limited 
in the amount of information they can reveal about the fine-scale behavior 
of the brain. The work presented in this chapter supports this argument, 
numerically, by inspecting the point-processes of epileptic events at different 
time scales. 


Epileptic event times, although seemingly random, are shown 
here to contain some structure with memory found between 
events, up to a scale of 8 hours. The type of structure found 
indicates that long range memory exists in the generators of 
epileptic seizures. 


These findings suggest that current prediction algorithms may be using 
insufficient amounts of data that do not account for the presence of such long 
memory. Future efforts must consider this, perhaps by integrating observa- 
tions other than the EEG into the process. 

Although the revelations are useful, the methods used in this chapter are 
not themselves practical for the implementation of a seizure prediction algo- 
rithm — the number of data points required is often prohibitive. 


For this work to be usable directly in prediction algorithms, 
research must shift to a different type of data in which epileptic 
events are very frequent. This could simply mean a reduction 
in the scale at which the EEG is recorded, perhaps using single 
microelectrodes as opposed to more global electrode arrays. 


Although it is unlikely that this data by itself will ever be used as a pre- 
dictor it is feasible that it could be used instead as a support mechanism to 
make current predictors more robust. This support can be as simple as the 
introduction of expected prediction errors. 

The identification of scaling in epilepsy could lead to an alternate route of 
research in which this information is used for the validation or the develop- 
ment of models capable of replicating this behavior. Interestingly LRD was 
identified in both rat and human EEG data. Is it possible that we have a 
network only explanation for epilepsy? If so, the brain structure may be more 
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important than the actual dynamics. How much do chemical processes affect 
the generation of epileptic activity? We should use whatever clues we have 
access to so as to build an appropriate model of epilepsy. 

These models could subsequently be used in the prediction process, and 
although it is not the point of this text to suggest how this may be done, a 
predictor and even a detector based on the properties of network architecture, 
neuron properties and other such factors has a clear advantage over traditional 
black box time-series analysis. Until the generators of epileptic activity are 
better understood it is our belief that it is unlikely that implantable devices 
will ever move beyond the detection regime. This is of course still a workable 
solution for some — preventing seizures from occurring is the goal after all — 
but is unlikely to serve the wide variety of epilepsies that exist. 
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Concluding Remarks 


The EEG provides us with a window into the brain, and was used in this book 
to study detection and prediction of epileptic seizures. From a physical princi- 
ples point of view the EEG measures voltages that are due to electric charges 
distributed across the volume contained by the skull. There is interference 
from other electromagnetic field sources outside the skull, but in the general 
environment of the recording equipment it can be minimized (although not 
completely eliminated). The EEG measurement reflects the instantaneous 
charge distribution across the skull volume. The temporal changes in the 
EEG signal are due to the motion of these charges as a consequence of brain 
activity as well as artifacts such as muscular/ocular movement. The primary 
purpose of the EEG is to consider brain activity and ignore the artifact. 

Further interpretation of the typical current distribution in the brain, mak- 
ing use of the physiology of the neuronal structure, leads to the model of cur- 
rent dipoles as the primary source for the EEG signal. This in turn reveals 
that the extracranial or scalp EEG is indeed a very blunt instrument which 
only records a measurable voltage deviation when large groups of neurons are 
acting in a coordinated manner. This coordination can be a consequence of 
the synchronized activity of neurons, or an event of “chance” in a much larger 
population of neurons. From the brain geometry it follows that the scalp 
EEG measures cortex activity, as the cortex effectively shields the rest of the 
brain from the EEG instrumentation. An electrode pair’s EEG recording is 
typically influenced by the cortex activity in a radius of about 2.5cm around 
each electrode. The scalp EEG is an observation of brain activity at a truly 
macroscopic scale. 

When we analyze EEG records it is important to realize that voltages 
are only defined up to a constant, and hence only make sense as a voltage 
difference between two points in space. Moreover, the recording equipment 
(the location of the electrodes in particular) and the skull geometry play 
a significant role in determining the magnitude of the measurement signal. 
Given a recording over a period of time it therefore makes sense to normalize 
the signal by eliminating the mean (set the mean to zero), and scaling the 
signal so as to make its standard deviation one. Neither feature, the mean or 
the variance, have meaning. 


From a complexity point of view it is equally clear that the typical scalp 
EEG is a blunt instrument. The cortex contains in the order of 100 billion 
neurons, whereas a typical EEG record may contain in the order of a 100 
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parallel voltage recordings. There is simply not enough information in the 
EEG to be able to reconstruct the intricate behavior of the brain. Depth 
electrodes that record the activity of a single neural cell may indeed assist to 
unravel at this microscopic level the behavior of a single neural cell. But even 
the meso-scopic scale behavior of a few thousand neurons in a cortical column 
remains elusive, regardless of which EEG method is used. There remains 
much more work to be done from a modeling point of view. 

Nevertheless, the EEG provides a very useful window into brain activity. 
It is relatively straightforward to extract features that distinguish clearly be- 
tween sleep and awake conditions, regardless of the individual being measured. 
No doubt, the sleep and awake conditions imply some form of difference in 
behavior at the macro-scopic scale, which can and does indeed show up clearly 
in any EEG recording. 

However, identifying the EEG features that allow one to detect epileptic 
episodes and differentiate it from normal brain behavior is more intricate. 
Epilepsy is a collective for a very large group of brain misbehaviors that man- 
ifest themselves in a variety of ways in the EEG. There are not only many 
epilepsy variations between different people, but also in the same patient one 
may observe differences between epileptic episodes even in the same EEG 
record. 

Using machine learning technology, an expert system that mimics and 
indeed rivals the performance of a clinician in deciding which epochs of an 
EEG recording correspond to epileptic behavior can be constructed. The ba- 
sic features of any such expert system consist of three phases, first signal 
pre-conditioning, next feature extraction, typically using windowed data, and 
finally detection. Which features are relevant depend on the application, and 
typically are learned from annotated EEG recordings. Patient independent 
and patient specific detection can be catered for in much the same way. De- 
tector performance can be expressed using metrics such as true positive vs. 
false negative detection rates. Excellent detector results can be obtained with 
features that track synchronization across a number of recordings. This neces- 
sarily implies that the detected seizure affects a significant area of the cortex 
(occupying a large proportion of the spatial neighborhood of the relevant elec- 
trodes). 

Intracranial EEG show greater potential in detecting epileptic epochs, at 
the cost of a rather unwelcome major surgical intervention. In the context 
of focal epilepsy, the proximity of the intracranial EEG electrodes allows the 
acquisition of cleaner signals, which makes it easier to learn and to interpret 
signal features. 

Detection of an epileptic episode is in general feasible with only a minor 
delay from the epileptic onset (in the EEG record). In the case of intracranial 
EEG, detection may therefore occur even before behavioral clinical symptoms 
become observable. This is unlikely the case for scalp EEG because of its lack 
of area specificity, so that onset and detection typically only occur when the 
epilepsy has progressed significantly and affects a large portion of the cortex. 
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Early detection opens the possibility for intervention. This is an important 
topic for further research, with great clinical promise. 

Prediction of epilepsy is of course even better. However, is there a clearly 
identifiable pre-seizure state in the EEG? Research to date is inconclusive. 
From a dynamical system's point of view there are two very different ways to 
interpret epilepsy with. First, one can consider epilepsy as the brain's behav- 
ior after a bifurcation. A particular condition in the brain changes over time, 
and the corresponding brain dynamics switch dramatically, from a normal to 
an epileptic state of behavior once this condition exceeds a certain thresh- 
old. Alternatively, it may be that epilepsy is just the behavior of the brain 
in another part of the state space, a part that a normal functioning brain 
does not visit. The former could well be a predictable phenomenon, provided 
the particular brain condition is identifiable from the EEG signal. However 
the latter model of epilepsy almost defies prediction from a purely EEG point 
of view, and would best be approached using EEG together with other mea- 
surements. Despite these dynamic differences, both modes of operation are 
entirely compatible with detection of epileptic activity using the EEG. 

In order to elucidate the question of predictability of epilepsy the notion 
of functional memory between epileptic events is useful. If successive epileptic 
events were not functionally related, predictability is futile. Early evidence 
points to a long-range dependency in the time series of epileptic onsets. This is 
positive in that predictability is not outright falsified. It is however somewhat 
of a Phyrrus victory because long range dependency requires long observation 
times, and moreover it is indicative of the enormous difficulty one faces in 
predicting long intervals between successive epileptic events. Further work 
is called for. It appears essential to analyze (long) EEG time series for long 
range dependencies and associated power laws in epileptic onset as well as 
seizure severity to shed more light on this. 

Despite its near 100 year history, our collective understanding of the EEG 
is very much elementary, not in the least because of the enormous complexity 
of the brain it deals with. There are a great many questions that have been 
answered, but also there remain many interesting open problems. 
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action potential The response of a neuron to the integration of information 
incoming from the dendrites. The action potential propagates through 
the cell's axon. 


AED See anti-epileptic drug. 


aliasing A type of measurement error introduced when the sampling rate is 
lower than twice the highest frequency found in a signal. 


animal model A model of a disorder or pathology that uses a living but 
non-human subject to replicate the symptoms. 


ANN See artificial neural network. 


anti-epileptic drug (AED) A drug prescribed to stop or minimize the num- 
ber of seizures. 


artifact Activity measured by the EEG that does not originate in the brain. 
For example, the movements of the eye generate an artifact that inter- 
feres with the EEG. 


artificial neural network (ANN) A type of classifier based on a network 
of mathematical units designed to mimic the activity of a biological 
neuron. 


association rules A type of classifier based on simple relationships between 
the outputs of a feature extractor. 


auto-correlation A statistical quantity that estimates how much a signal 
repeats itself over time. 


axon A neural fiber that conducts action potentials, the outputs generated 
by a neuron’s cell body. 


bifurcation An abrupt change or split of activity into a different mode of 
behavior. In mathematics it is a very rigorously defined concept. 


bioelectromagnetism The interaction of electric and magnetic fields in bi- 
ological tissue. 


black box dynamic model A mathematical model of a system based only 
on measured data and not on its underlying physical construct. 
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capacitance A quantity that describes the ability of a body to store charge. 


cell body The part of a neuron that integrates the inputs coming from the 
dendrites to generate the outputs projected by the axon. 


central nervous system (CNS) The organ formed by all the neurons in 
the human body, which together determine behavior. The CNS spans 
the brain as well as the spinal cord. 


cerebral cortex A 1-2mm thick sheet of highly interconnected neurons that 
form the outer layer of the brain. See also cortical fold. 


cerebrospinal fluid (CSF) The clear fluid found inside and surrounding 
the brain. 


charge A fundamental electric property of sub-atomic particles that defines 
the electric field generated by its interaction with other charges. Charge 
can be positive or negative. 


chemical current See current. 


classifier A method that identifies the class that a set of extracted features 
belong to. See for example ANN, SVM and association rules. 


CNS See central nervous system. 


computational noise An error introduced into the estimate of a statistic 
because of insufficient data or algorithmic issues. 


conductivity A quantity that describes the ability of a material to conduct 
or transmit charge, defined per unit length of the material. 


correlation dimension A non-linear statistical quantity that estimates the 
non-integer dimension of a system. It gives an idea of the complexity of 
the system generating the signal. 


cortical column A small region of brain containing roughly 50,000 neurons 
that behave similarly to one another. It is believed to be a functional 
unit of the brain. A cortical column is sometimes referred to as a macro- 
column because a smaller mini-column consisting of approximately 100 
neurons has also been proposed. 


cortical fold The natural folds of the cortex. The exposed curved part of a 
fold is known as the gyrus, while the walls found deeper are known as 
the sulci. 


cortico-cortical projection Neural connectivity between different parts of 
the cortex. 


cross-correlation A statistical quantity that estimates how similar two sig- 
nals are to one another. 
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current The movement of charged particles. When the particles are electrons 
the current is known as an electric current. When the particles are 
charged ions the current is known as a chemical current. 


current dipole The electric field generated by the combination of a (the- 
oretical) current source with a current sink. Current dipoles are most 
commonly used to approximate average electric activity in a small vol- 
ume of brain. 


DBS See deep-brain stimulation. 


deep-brain stimulation (DBS) The application of electric charge to re- 
gions of the brain beneath the cortex. 


dendrite Branched fibers that form the inputs to a neuron’s cell body. 
detection The correct identification of activity after it occurs. 


dimension In mathematics dimension typically refers to the minimum num- 
ber of co-ordinates required to describe a system. Different definitions 
exist (see, for example, correlation dimension) but all give an idea of 
the complexity of the system. 


dipole The combination of two charges, typically used in reference to its 
generated electric field. If the two charges are electric then it is known 
as an electric dipole. 


dynamic model A mathematical representation of the behavior of a system. 


EEG See electroencephalogram. 
electric current See current. 
electric dipole See dipole. 


electric field The vector representation of the electric force generated by a 
particular charge distribution. 


electric potential Also known as voltage, it is the scalar potential energy 
necessary to move a unit charge between two points. It is related to the 
vector electric field. 


electrical stimulation The application of electric charge to stimulate activ- 
ity. 


electricity The movement of electric charge. 
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electrode A conductive material that enables recordings of electrical activity. 
In the EEG the electrodes can be placed on the scalp, on the cortex, or 
in deeper brain structures. 


electroencephalogram (EEG) A measurement system that records the time- 
evolving voltages generated by the brain. 


entropy A non-linear statistical quantity that estimates how much redun- 
dancy there is in an observed signal. It gives an idea of the complexity 
of the system generating the signal. 


epilepsy A neurological disorder characterized by repeated occurrences of 
seizures. 


EPSP See post-synaptic potential. 


expert system A sophisticated method of selecting and combining the out- 
puts of both the feature extraction and classifier stages to optimize the 
performance of a detector. 


extra-cellular fluid The fluid found outside a cell. 


false positive When an incorrect detection is made by a classifier. 
false positive rate (FPR) See specificity. 


fast Fourier transform (FFT) An efficient way of computing the Fourier 
transform from a digital signal. 


feature extraction The estimation of statistics from a signal, typically after 
windowing. In a detector these features are passed to the classifier. 


FFT See fast Fourier transform. 


filter A mathematical transform that selectively removes components of a 
signal. For example, a low pass filter removes all high frequencies from 
a signal. 


Fourier transform A mathematical transform that converts a signal from 
time domain to frequency domain. 


FPR See false positive rate. 


frequency domain The representation of signals by their frequency compo- 
nents. 


gray matter Nerve tissue primarily consisting of neurons. 


gyrus See cortical fold. 
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histology A description of the micro-scopic cellular physiology of tissue. 

homogeneous medium A material whose properties (e.g., electrical) are 
uniform in all (infinite) directions. 

in vitro A biological experiment conducted in dead (but preserved) tissue. 

in vivo A biological experiment conducted in live tissue. 


inhomogeneous medium A material whose properties (e.g., electrical) change 
over space. 


input The signals that are inserted into a system, used to determine its 
outputs. 


integrate-and-fire unit A mathematical model of a single neuron based on 
statistics of observed events rather than complex anatomical details. 


intra-cellular fluid The fluid found inside a cell. 


intra-cranial EEG EEG measured by electrodes placed beneath the skull, 
either on the cortex (cortical EEG) or in deeper structures (depth EEG). 


ion gate Channels that allow the transmission of ions between the inside and 
outside of neurons. The gates are typically triggered open by chemical, 
electrical or physical processes. 


IPSP See post-synaptic potential. 
linear system A system whose mathematical descriptors are linear, that is, 


they are additive and homogeneous. Their responses obey the principle 
of superposition. 


linearization The approximation of a non-linear function by a linear one, 
valid only around the point at which the linearization is performed. 


long range dependence (LRD) A relationship where the current value of 
a signal/object depends on values that occurred far away in time or 
space. Also known as long range memory. 


long range memory See long range dependence. 
LRD See long range dependence. 


Lyapunov exponent A non-linear statistical quantity that estimates how 
fast the datapoints in a signal diverge from each other. It gives an idea 
of the complexity of the system generating the signal. 


macro-column See cortical column. 
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macro-scopic model A mathematical model that describes activity at the 
macro-scopic scales, which in the brain refers to modeling the activity 
in centimeters of brain tissue. 


macro-scopic scale Refers to activity of large regions of the brain, spanning 
scales of cm rather than mm. 


map (mathematical) The mathematical relationship between two quanti- 
ties. 


mean A statistical quantity that estimates the average value of a set of data. 


measurement A record of the activity of a system. For example, see elec- 
troencephalogram. 


measurement error A noise or error introduced to a recorded signal be- 
cause of the inability of measurement equipment to measure activity to 
infinite precision. 


membrane potential The electric potential between the intra-cellular and 
extra-cellular fluid. 


meso-scopic scale Refers to the activity of a small network of around 50,000 
neurons found in the mm scale (see cortical column). 


micro-scopic scale Refers to the activity of a few neurons found in the uum 
scale. 


mini-column See cortical column. 


monopole A single charge, typically used in reference to its generated electric 


field. 


neural network The complex connectivity between (up to) billions of neu- 
rons. Neural networks are necessary for the central nervous system to 
function. 


neuromodulator A type of neurotransmitter that acts more diffusely and 
affects large populations of neurons simultaneously. Like neurotransmit- 
ters, there are many different types of neuromodulators. 


neuron A cell that is the basic functional unit of the central nervous system. 


neurotransmitter A chemical used to facilitate communication between 
neurons (at the synapses) in response to an incoming action potential. 
There are many different types of neurotransmitters. 


non-linear system A system whose mathematical descriptors are not addi- 
tive, homogeneous, or both. Their responses does not obey the principle 
of superposition. (See linear system.) 
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normalization The removal of the mean and scale dependence of a signal 
so that it may be compared to all other signals recorded using different 
methods. 


onset delay A performance metric that represents the delay between the 
onset of an event and its detection by a classifier. 

output The activity projected outside of a system in response to its inputs 
as well as its internal processes. 

parameter A relatively stationary quantity used in a mathematical expres- 
sion to describe properties of a system. 

PDF See probability distribution function. 


permittivity A quantity that describes the ability of a material to store 
charge, defined per unit length of the material. 


phenomenological model A mathematical model that describes the phe- 
nomena of the observed signal rather than the underlying physical com- 
ponents. In this book the phenomenological models are used to describe 
the macro- or meso-scopic potentials recorded by the EEG. 


physical dynamic model A mathematical model of a system based on the 
underlying physical construct (e.g., physiology) of the system. 


post-synaptic potential (PSP) The response of a cell to an incoming ac- 
tion potential. It can be excitatory (EPSP) or inhibitory (IPSP). EPSPs 
promote further transmission whilst IPSPs inhibit it. 


power law The mathematical relationship formed when a straight line is 
observed in a log-log plot. 


power spectral density (PSD) The absolute squared magnitude of the 
Fourier transform of a signal. 


prediction The forecasting of activity in the future. 


preprocessing The initial stages of a detector used to normalize and stan- 
dardize a signal prior to feature extraction. 


probability distribution function (PDF) A function that describes the 
probability of each value of a stochastic variable occurring. 


PSD See power spectral density. 


PSP See post-synaptic potential. 


quantization The process of converting an analog signal into a digital signal 
by dividing the signal space into a finite number of bins. 
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receiver operating characteristic (ROC) curve A visualization of the 
performance of a detector by plotting the false-positive rate versus the 
true-positive rate. 


resistivity The reciprocal of conductivity, that is, the ability of a material to 
impede the transmission of charge. 


ROC See receiver operating characteristic. 


sampling The conversion of a continuous signal into a sequence of discrete 
values, typically performed in the measurement process. The rate at 
which samples are acquired is known as the sampling frequency. 


sampling frequency / sampling rate See sampling. 
scalar A quantity that has only magnitude (as opposed to a vector). 


scale-invariance When the properties of a signal/object are the same when 
analyzed at different temporal/spatial scales. Scale-invariance is an in- 
stance of self-similarity. 


scalogram A plot of the wavelet scale number versus a logarithmic quantity. 
It can be used to identify power laws. 


scalp EEG EEG measured by electrodes placed on the scalp. 


seizure A temporary impairment to normal brain function where there is 
excessive and highly synchronous firing of action potentials. A seizure 
can affect the entire brain (generalized seizure) or parts of the brain 
(partial/focal seizure). 


self-similarity When the properties of an arbitrary entity are the same when 
it is looked at as a whole or in parts. 


sensitivity A performance metric that represents the rate of correct detec- 
tions made by a classifier. Also known as £rue positive rate. 


sensory reticular nucleus (SRN) A sub-system of the thalamus responsi- 
ble for relaying sensory input onto the cortex. 


short-range memory A relationship where the current value of a signal/object 
depends on recent previous values. 


signal A collection of measurements taken from a system over time. 


signal processing The analysis of signals used to extract statistical infor- 
mation. 


spatial filter A filter that operates on the spatial (as opposed to temporal) 
domain. 
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specificity A performance metric that represents the rate of incorrect detec- 
tions made by a classifier. It is closely related to false positive rate. 


spectrogram The computation of the power spectral density over time using 
windowed signals. 


spike A discrete representation of an action potential. 
SRN See sensory reticular nucleus. 


stability analysis The process of determining whether a system is stable or 
unstable. 


stable system When the steady state of a system is finite — that is, when 
the system responses are close at a point in time, they remain close over 
indefinite periods of time. 


standard deviation The square root of variance. 


stationarity When a quantity (e.g., a parameter) does not change its values 
or properties over time. 


statistic A quantity (e.g., à variable) whose average properties are under- 
stood as belonging to an ensemble of data. 


steady state The resting state of the system in response to a constant input. 


stochastic A random quantity or variable that may be described using a 
probability distribution function. 


sub-system A smaller system that can be combined with other sub-systems 
to describe the entire system. 


sulcus See cortical fold. 


support vector machine (SVM) A type of classifier based on identifying 
the vectors in the observed feature space that best separates between 
classes. 


SVM See support vector machine. 


synapse The connecting region between axons and dendrites that allows 
transmission of information between neurons. 


synchronization When events at different locations occur at the same time. 


system A description of an entity, such as the brain, that by its physiological 
or physical make-up constrains the signals that are generated. The 
outputs of a system are determined by its internal structure as well as 
its inputs. 
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thalamic reticular nucleus (TRN) A sub-system of the thalamus respon- 
sible for regulating the amount of sensory input relayed onto the cortex. 


thalamo-cortical projection Neural connectivity between thalamus and 
cortex. 


thalamus A region of the brain found sub-cortically that is responsible, 
among other things, for regulation of sensory input into the cortex. 
It is also believed to be responsible for synchronizing activity between 
different parts of the cortex. 


time domain The representation of signals over time. 

TPR See true positive rate. 

TRN See thalamic reticular nucleus. 

true positive When a correct detection is made by a classifier. 


true positive rate (TPR) See sensitivity. 


unstable system When the system is not stable, i.e., there are system re- 
sponses that are close at an instant in time, and then diverge arbitrarily 
far from each other in the future. 


vagal nerve stimulation (VNS) The application of electric charge to the 
vagus nerve, typically used to reduce the number of epileptic seizures. 


variable A quantity, typically time-evolving, used in a mathematical expres- 
sion to describe properties of a system. 


variance A statistical quantity that describes the divergence of data. 


vector A quantity that has both magnitude and direction (as opposed to a 
scalar). 


VNS See vagal nerve stimulation. 
voltage See electric potential. 


volume conductor A medium, typically modeled in 3 dimensions, that con- 
ducts electric charge according to its electric properties: conductivity, 
resistivity and permittivity. 


wavelet analysis An analysis method that extracts both frequency and time 
domain information from a signal using specialized mathematical func- 
tions called wavelets. 


white matter Nerve tissue primarily consisting of connecting fibers rather 
than neurons. 
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windowing The partitioning of a signal into smaller segments so that its 
temporal evolution can be analyzed. The applied windows may be rect- 
angular or non-rectangular (e.g., Hanning window, Hamming window). 
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